Rationale and
Objectives
This document details the modeling workflow implemented for
estimating dissolved and equilibrium N2O concentrations and saturation
ratios using the 2017 Nation Lakes Assessment (NLA) survey data. The NLA
sampling sites were distributed among the target population of US lakes
(in the lower 48 states) according to a probabilistic survey design with
samples stratified among categories of lake surface area, WSA9
ecoregion, and US state (excluding AK and HI). Due to the stratification
scheme, some types of lakes in the sample population were intentionally
over-represented (e.g., large lakes) and some were under-represented
(e.g., small lakes) relative to the target population. Due to the
unequal probability design, inferences from the sample had to be
adjusted for inferences on the broader populations of interest (e.g,
National-, state-, ecoregion-, and size class-specific estimates).
The concept of the “complete data likelihood” is useful for
conceptualizing biases arising from sampling design (Zachmann et al. 2022; Gelman et al. 2014
Ch. 8; Link and
Barker 2010). For the NLA survey data, the population of US
lakes in the lower 48 states larger than 4 hectares was considered the
complete data and the probabilistic samples were considered a subset of
that complete data. The portion of US lakes not included in the sample
were considered “missing” from the complete data not at random,
but conditional on the pre-specified design (stratification) variables.
This non-random missingness was not ignorable for the purpose of making
inferences from the sample to the target population. In a model-based
framework, however, including the design parameters as predictors in a
regression model is one way to adjust for the missingness. For a
thorough and recent treatment of this concept in the context of national
surveys of environmental resources, refer to (Zachmann et al.
2022). This concept is a key motivator for the increasingly
popular mulitilevel regression with poststratification (MRP) approach to
model-based inference [Gelman et al. (2014);
Gelmant_etal_2020 Ch. 17].
The following workflow illustrates our model-based approach, based
largely on the logic of MRP, but with an elaboration on the
poststratification step to enable eventual estimates of total gas flux
at the population level, which required scaling up from lake-level
estimates. The typical MRP process is carried out in two steps. The
first step is to fit regression models for the response variables of
interest (e.g., dissolved N2O, equilibrium N2O) conditional on the
survey design variables ┼(i.e., ecoregion, state, lake size). The second
step is post-stratfication, wherein the posterior parameter estimates
from the regression model for the sample population are weighted based
on their known or assumed distribution in the population of interest
[i.e., post-stratification table; Gelman, Hill,
and Vehtari (2020) Ch. 17]. The poststratification
table in our case, for example, would be a population summary of lakes
among the design variables: ecoregion, state, and size category.
However, because we eventually needed lake-level estmates, instead of
predicting to a postratification table, we predicted to each individual
lake in the population of interest. This meant predicting to the full
target population of 465,897 natural and man made US lakes larger than 4
hectares in the lower 48 states. These predictions were assumed relevant
to average conditions during the “index period” for each lake in 2017.
Details about the sampling frame as well as the target population are
further clarified in the workbook below with data summaries and
code.
For the regressions, we used multilevel models fit in a fully
Bayesian fashion Multilevel models are thought to work well in this
context because they provide regularized estimates along the design
groupings, which can improve out-of-sample inferences (McElreath 2020). Inferences for lake
types that may be missing from the sample, but are part of the
population of interest are also straightforward using this approach
(Gelman, Hill, and Vehtari 2020 Ch. 17; McElreath
2020). More information these models, their specific
parameters, R code, fit evaluations, and resulting inferences are
presented in this document.
The overriding objective of the modeling effort was to provide
population level estimates for (1) dissolved and equilibrium N2O
concentrations; (2) the N2O saturation ratio (i.e., dissolved
N2O/equilibrium N2O); and (3) the proportion of under-saturated water
bodies (i.e., saturation ratio < 1). The estimates would also be used
to later estimate the total flux of N2O gas attributable to the target
population of lakes over the index period. The saturation ratio
estimates were calculated as a derived quantity based on the ratio of
modeled dissolved to equilibrium N2O. Because dissolved and equilibrium
N2O were observed on the same sample units (lake sites), we developed
models for estimating their joint distribution. The response variable in
the models was, therefore, multivariate to account for potential
statistical dependencies between dissolved and equilibrium N2O due to,
for example, common dependencies on geography. Although point
predictions of the mean marginal probabilities from separate models
could be comparable, a joint model allowing correlated observation-level
errors (i.e., residuals) was expected to better capture uncertainty and
potentially improve out-of-sample predictions, should the variables be
conditionally correlated (Warton et al.
2015; Poggiato et al. 2021). All of the models
fit were constructed using the brms package (Bürkner 2017) in R (R
Core Team 2021) as an interface to Stan, a software package
for fitting fully Bayesian models via Hamiltonian Monte Carlo [HMC;
Team (2018b); Team (2018c); Team (2018a)].
Data
As explained in a previous data munging document document (https://github.com/USEPA/DissolvedGasNla/blob/master/scripts/dgIndicatorAnalysis.html),
duplicate dissolved gas samples were collected at a depth of ~0.1m at
designated index sites distributed across 1091 lakes nationwide, of
which 95 were sampled twice as repeat visits. This randomly selected
subset of revisit sites was used as a test set for assessing model fit
and out-of-sample performance.
Gas samples were analyzed via gas chromotography and concentrations
were recorded to the nearest 0.001 nmol/L. The samples were collected
under a stratified, unequal probability design and each gas observation
was indexed to an individual lake selected with unequal probability from
5 different lake size categories, \(j \in
j=1,...,J = 5\), according to surface area (ha), and from within
a state, \(k \in k=1,...,K = 48\),
situated within an aggregated, WSA9 or Omernik ecoregion, \(l \in l=1,...,L = 9\). All 9 WSA9
ecoregions were represented in the sample, including Xeric (XER),
Western Mountain (WMT), Northern Plains (NPL), Southern Plains (SPL),
Temperate Plains (TPL), Coastal Plains (CPL), Upper Midwest (UMW),
Northern Appalachian (NAP), and Southern Appalachian (SAP) regions. As
shown below, the data from the initial and revisit samples were
separately compiled into data frame objects in \(\textbf{R}\), with \(n=984\) and \(n=95\) rows, respectively, of gas
observations indexed to the survey design variables and several
potentially relevant covariates.
Import
The gas data and covariates were previously described and munged at
https://github.com/USEPA/DissolvedGasNla/blob/master/scripts/dataMunge.html.
That dataset was imported below.
load( file = paste0( localPath,
"/Environmental Protection Agency (EPA)/",
"ORD NLA17 Dissolved Gas - Documents/",
"inputData/dg.2021-02-01.RData")
)
save(dg, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda")
From the imported dataset, a new data frame for modeling was
constructed from the original file including only the variables of
interest: (1) the N2O gas observations; (2) the survey design variables
indexed to those observations; and (3) additional covariates considered
potentially useful for improving the fit of the model. The data frame
below excluded the second-visit observations, which would later be used
for model checking. Some variables from the imported data were renamed
for convenience. In addition, the NO3 covariate was rounded according to
the documented measurement precision. An alternative version of the NO3
covariate was also created in this step by log-transforming and
re-coding it as an ordered factor with five levels at hand-drawn cut
points. The left-most cut point separated observations below the
detection limit from the completely observed samples. The remaining cut
points in the positive direction were drawn at approximately equal
distances along the log scale. Finally, it should be noted that one lake
that was sampled was missing information on the N2O gas measurements and
it was removed from the data frame.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda")
dg %>%
filter(!is.na(dissolved.n2o.nmol)) %>% # 1 obs with missing measurement
nrow() # number of observations before filtering
[1] 1185
df_model <- dg %>%
filter(!is.na(dissolved.n2o.nmol)) %>%
filter(sitetype == "PROB") %>% # probability samples only
filter(visit.no == 1) %>%
mutate(n2o = round(dissolved.n2o.nmol, 2),
n2o_eq = round(sat.n2o.nmol, 2),
n2o_sat = n2o.sat.ratio,
n2o_em = e.n2o.nmol.d,
n2o_flux = f.n2o.m.d,
WSA9 = factor(ag.eco9),
state = factor(state.abb[match(state.nm, state.name)]),
area_ha = area.ha,
log_area = log(area_ha),
chla = chla.result,
log_chla = log(chla),
elev = elevation,
log_elev = log(elev + 1),
do_surf = o2.surf,
log_do = log(do_surf),
bf_max = max.bf,
sqrt_bf = sqrt(bf_max),
size_cat = recode(area.cat6,
"(1,4]" = "min_4" ,
"(10,20]" = "10_20",
"(20,50]" = "20_50",
"(4,10]" = "4_10",
">50" = "50_max")) %>%
mutate(size_cat = factor(size_cat,
levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
ordered = TRUE)) %>%
mutate(no3 = ifelse(nitrate.n.result <= 0.0005, 0.0005, round(nitrate.n.result, 4))) %>%# 1/2 mdl 0.01
mutate(no3_cat = cut(log(no3), # convert no3 to ordered factor with 5 levels
breaks = c(-Inf, -7.5, -5.5, -3.5, -1.5, Inf),
labels =seq(1, 5, 1))) %>%
mutate(no3_cat = factor(no3_cat,
levels = seq(1, 5, 1),
ordered = TRUE)) %>%
mutate(date = as.Date(date.col)) %>%
mutate(jdate = as.numeric(format(date, "%j"))) %>%
mutate(lat = map.lat.dd,
lon = map.lon.dd) %>% # longitude
mutate(surftemp = surftemp,
log_surftemp = log(surftemp)) %>%
select(WSA9,
state,
size_cat,
site.id,
lat,
lon,
date,
jdate,
surftemp,
log_surftemp,
area_ha,
log_area,
elev,
log_elev,
chla,
log_chla,
do_surf,
log_do,
bf_max,
sqrt_bf,
n2o,
n2o_eq,
no3,
no3_cat
)
save(df_model, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
nrow(df_model) # number of obs after filtering
[1] 984
print(df_model)
A second dataframe, including only the second visit observations, was
constructed below. These data were later used as a “test set” to assess
the out-of-sample fit of the model developed on the first-visit or
training data.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda")
# number of observations before filtering probability samples
dg %>%
filter(!is.na(dissolved.n2o.nmol)) %>% # remove obs with missing response measurements
nrow()
[1] 1185
df_test <- dg %>%
filter(!is.na(dissolved.n2o.nmol)) %>%
filter(sitetype == "PROB") %>% # probability samples only
filter(visit.no == 2) %>%
mutate(n2o = round(dissolved.n2o.nmol, 2),
n2o_eq = round(sat.n2o.nmol, 2),
n2o_sat = n2o.sat.ratio,
n2o_em = e.n2o.nmol.d,
n2o_flux = f.n2o.m.d,
WSA9 = factor(ag.eco9),
state = factor(state.abb[match(state.nm, state.name)]),
area_ha = area.ha,
log_area = log(area_ha),
chla = chla.result,
log_chla = log(chla),
elev = elevation,
log_elev = log(elev + 1),
do_surf = o2.surf,
log_do = log(do_surf),
bf_max = max.bf,
sqrt_bf = sqrt(bf_max),
size_cat = recode(area.cat6,
"(1,4]" = "min_4" ,
"(10,20]" = "10_20",
"(20,50]" = "20_50",
"(4,10]" = "4_10",
">50" = "50_max")) %>%
mutate(size_cat = factor(size_cat,
levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
ordered = TRUE)) %>%
mutate(no3 = ifelse(nitrate.n.result <= 0.0005, 0.0005, round(nitrate.n.result, 4))) %>%# 1/2 mdl 0.01
mutate(no3_cat = cut(log(no3), # convert no3 to ordered factor with 5 levels
breaks = c(-Inf, -7.5, -5.5, -3.5, -1.5, Inf),
labels =seq(1, 5, 1))) %>%
mutate(no3_cat = factor(no3_cat,
levels = seq(1, 5, 1),
ordered = TRUE)) %>%
mutate(date = as.Date(date.col)) %>%
mutate(jdate = as.numeric(format(date, "%j"))) %>%
mutate(lat = map.lat.dd,
lon = map.lon.dd) %>% # longitude
mutate(surftemp = surftemp,
log_surftemp = log(surftemp)) %>%
select(WSA9,
state,
size_cat,
site.id,
lat,
lon,
date,
jdate,
surftemp,
log_surftemp,
area_ha,
log_area,
elev,
log_elev,
chla,
log_chla,
do_surf,
log_do,
bf_max,
sqrt_bf,
n2o,
n2o_eq,
no3,
no3_cat
)
save(df_test, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_test.rda")
nrow(df_test) # number of obs after filtering for probability samples, first visits, and removing one site missing ecoregion (WSA9) info.
[1] 95
print(df_test)
Target
population
Below. the NLA sampling frame was imported and then filtered to
include only the target population or sampling frame for this
project.
df_pop <- read.csv(file = paste0(localPath,
"/Environmental Protection Agency (EPA)/",
"ORD NLA17 Dissolved Gas - Documents/",
"inputData/NLA_Sample_Frame.csv"), header = T)
sframe <- df_pop %>%
filter(nla17_sf != "Exclude2017") %>%
filter(nla17_sf != "Exclude2017_Include2017NH") %>%
filter(state != "DC") %>%
filter(state != "HI") %>%
droplevels() %>%
mutate(WSA9 = factor(ag_eco9),
WSA9 = forcats::fct_drop(WSA9), # remove NA level
state = factor(state),
size_cat = factor(area_cat6),
lat = lat_dd83,
lon = lon_dd83,
log_area = log(area_ha),
elev = elevation,
log_elev = ifelse(elev <= 0, 0, elev), # assumed elev < 0 to be elev = 0
log_elev = log(log_elev + 1)
) %>%
mutate(size_cat = recode(size_cat,
"(1,4]" = "min_4" ,
"(10,20]" = "10_20",
"(20,50]" = "20_50",
"(4,10]" = "4_10",
">50" = "50_max")) %>%
mutate(size_cat = factor(size_cat,
levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
ordered = TRUE)) %>%
select(WSA9, state, size_cat, lat, lon, area_ha, log_area, elev, log_elev)
rm(df_pop)
save(sframe, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")
print(sframe)
The resulting target population above included a total of 465,897
waterbodies.
Cross tabulations below describe the structure of the target
population with respect to the design variables. The cross-tabulation
makes it clear that each ecoregion does not contain each state.
Therefore, in the statistical sense, states were nested in
ecoregions.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")
sframe %>%
group_by(WSA9, state) %>%
summarise(n = n(), .groups = "drop") %>%
spread(state, n) %>%
print()
Likewise, lake size category was nested in state (which was nested in
ecoregion). That is, not every ecoregion:state in the population of
interest contained every size category (below).
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")
sframe %>%
group_by(WSA9, state, size_cat) %>%
summarise(n = n(), .groups = "drop") %>%
spread(size_cat, n) %>%
print()
Below, the sampling frame was selected down to create a
post-stratification table. Some of the variables were renamed to match
the naming conventions used in the observational data above. There were
536 types of lakes in the population of interest with respect to the
sampling design. The counts of those lake types (n_lakes) and their
proportions relative to the total population of lakes in the sampling
frame (prop_cell) are indicated below.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")
pframe <- sframe %>%
mutate(obs = 1) %>%
group_by(WSA9, state, size_cat) %>%
summarise(n_lakes = sum(obs), .groups = "drop") %>%
ungroup() %>%
mutate(prop_cell = n_lakes/sum(n_lakes)) %>%
mutate(type = "population")
save(pframe, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")
print(pframe)
Sample
vs. population
Below, the lake distributions in the population of interest were
compared to the proportions in the observed sample. There were 352 lake
types in the sample compared to the 536 in the population of of
interest. In total, there were 984 observations distributed across these
352 lake types in the sample; and the number of samples was not
distributed evenly across the types. Some cells were represented by as
few as 1 lake. In total, 536-352 = 184 lake types in the population of
interest were not represented in the sample.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
samp_props <- df_model %>%
mutate(obs = 1) %>%
group_by(WSA9, state, size_cat) %>%
summarize(n_lakes = sum(obs), .groups = "drop") %>%
ungroup() %>%
mutate(prop_cell = round(n_lakes / sum(n_lakes), 7)) %>%
mutate(type = "sample")
save(samp_props, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")
print(samp_props)
Below, a graphical comparison was constructed to depict the
distribution of cells in the population of interest versus
those in the sample.

Another comparison between population and sample was constructed
below by ecoregion. The samples were not balanced across ecoregions.
Lakes in the Coastal Plains (CPL) ecoregion, for example, were clearly
undersampled relative to their proportion of the population.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")
pframe_eco <- pframe %>%
group_by(WSA9) %>%
summarise(n_lakes = sum(n_lakes)) %>%
ungroup() %>%
mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
ungroup() %>%
mutate(type = 'population')
save(pframe_eco, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_eco.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")
samp_props_eco <- samp_props %>%
group_by(WSA9) %>%
summarise(n_lakes = sum(n_lakes)) %>%
ungroup() %>%
mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
ungroup() %>%
mutate(type = 'sample')
save(samp_props_eco, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_eco.rda")

A similar comparison by state was constructed below.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")
pframe_state <- pframe %>%
group_by(state) %>%
summarise(n_lakes = sum(n_lakes)) %>%
ungroup() %>%
mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
ungroup() %>%
mutate(type = 'population')
save(pframe_state, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_state.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")
samp_props_state <- samp_props %>%
group_by(state) %>%
summarise(n_lakes = sum(n_lakes)) %>%
ungroup() %>%
mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
ungroup() %>%
mutate(type = 'sample')
save(samp_props_state, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_state.rda")

Finally, a comparison by lake size category is shown below. Note that
small lakes were under-sampled relative to larger lakes by design.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")
pframe_size <- pframe %>%
group_by(size_cat) %>%
summarise(n_lakes = sum(n_lakes)) %>%
ungroup() %>%
mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
ungroup() %>%
mutate(type = 'population')
save(pframe_size, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_size.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")
samp_props_size <- samp_props %>%
group_by(size_cat) %>%
summarise(n_lakes = sum(n_lakes)) %>%
ungroup() %>%
mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
ungroup() %>%
mutate(type = 'sample')
save(samp_props_size, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_size.rda")

Sample-based
estimates
The overall mean and standard deviation for N2O in the sample:
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
df_model %>%
summarise(mean = mean(n2o),
sd = sd(n2o)) %>%
print()
The same summary for equilibrium N2O:
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
df_model %>%
summarise(mean = mean(n2o_eq),
sd = sd(n2o_eq)) %>%
print()
The saturation ratio (i.e., N2O / N2O-eq):
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
df_model %>%
summarise(mean = mean(n2o / n2o_eq),
sd = sd(n2o / n2o_eq)) %>%
print()
Finally, roughly 67% of lakes in the sample were undersaturated
(i.e., saturation ratio < 1):
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
df_model %>%
summarise(prop_undersat = sum((n2o / n2o_eq) < 1) / 984) %>%
print()
Using only the sample observations, a plot was constructed below of
the overall mean (dashed line) along with the ecoregion-specific means
(black circles). The shaded areas indicate +/- 1 standard deviation.
Neither dissolved N2O nor the saturation ratio were clearly structured
by ecoregion in the sample, but there did appear to be some structure in
the equilibrium N2O observations.

The same summary by state is below.

Finally, the same summary by size category:

Sample data
exploration
Below, the empirical distribution of N2O observations in the sample
was summarized using a density and rug plot below. Note the natural log
scale of the x axis. Both the N2O and equilibrium N2O data had
considerable right skew even after the log transformation, which was not
unexpected and has been noted in other studies (Webb et al. 2019). The saturation ratio
was also skewed since it was derived from the other two observed
variables (i.e., sat_ratio = n2o / n2o_eq).

Below are plots of N2O vs. NO3. The first plot shows log(N2O)
vs. log(NO3), as well as the ordinal categories assigned to NO3
(vertical lines). The leftmost vertical line is dashed and separates the
NO3 observations below the detection limit.
`geom_smooth()` using formula = 'y ~ x'

In the plot above, the trend is increasing and nonlinear on the log
scale. The increasing variance in N2O along the NO3 gradeient suggested
a potential mediator of the relationship between NO3 on N2O. Below are
plots of N2O vs. NO3 for 6 quantiles of the surface temperature
measurements (quantiles increasing from 1 to 6). This plot below
suggested that the NO3 effect on N2O may have been stronger in lakes
with higher observed temperatures.
`geom_smooth()` using formula = 'y ~ x'

The next plot below shows the relationship between N2O and NO3 at 6
different quantiles (increasing 1 to 6) of the log-scaled lake surface
area estimates.
`geom_smooth()` using formula = 'y ~ x'

Similar plots are below, but with NO3 expressed as an ordered
categorical variable with 5 levels. The positive and monotonic trends
area similar to the previous plots where NO3 was treated as continuous.
Note the large number of observations in the first NO3 category (no3_cat
= 1). This category represented all of the censored observations for
NO3, which was most of the data.



Below is a plot of log(N2O) vs. log(NO3) by ecoregion, which
suggested that the NO3 effect on N2O may have varied by ecoregion.
`geom_smooth()` using formula = 'y ~ x'

Below is the same plot as above but for the ordered categorical
version of NO3.

A plot below shows trends by state within just the Temperate Plains
(TPL) ecoregion. Within states, the number of observations were
relatively small, but the trends appeared closer to linear.
`geom_smooth()` using formula = 'y ~ x'

Model fitting
The first regression model was constructed to estimate the joint
distribution of log-transformed N2O and equilibrium N2O conditional on
the the design factors. Each log-transformed observation, \(i \in 1,..,N=984\), for each response,
\(p \in 1:P=2\), was assumed to be
drawn from a multivariate normal distribution with the parameters \(\nu\) and \(\Sigma\), where \(\nu\) is the multivariate mean estimated
conditional on the design effects and \(\Sigma\) is a covariance matrix containing
the observation-level variances and residual correlation: \[Y \sim MVN(\nu, \Sigma)\]
The multivariate mean is a vector of mean parameters, \(\nu:[\mu_{p=1}, \mu_{p=2}]\), for each
response. Each mean is further defined by a linear combination of
parameters where, for each response \(p\) and observation \(i\):
\[\mu_{pi} = \alpha_{0(pi)} +
\alpha_{1(pij)} + \alpha_{2(pijk)} + \alpha_{3(pijkl)} \\
\alpha_1 \sim MVN(0, \Lambda_1) \\
\alpha_2 \sim MVN(0, \Lambda_2) \\
\alpha_3 \sim MVN(0, \Lambda_3)\]
The linear combination of parameters defining \(\mu\) above include a fixed global
intercept, \(a_0\), that is estimated
directly from the data, and three separate, latent group-level effects
matrices, \(\alpha_1, \alpha_2,
\alpha_3\). The group effects were assumed to be multivariate
normal and are centered on zero in multivariate space. The spread of the
effects around zero are determined by a covariance matrix, \(\Lambda_1, \Lambda_2, \text{or }
\Lambda_3\), which are estimated directly from the data. These
covariance terms are further defined where:
\[\Lambda = \begin{pmatrix} 1 &
\tau_{p=1} \\ \tau_{p=2} & 1 \end{pmatrix} \chi \begin{pmatrix} 1
& \tau_{p=1} \\ \tau_{p=2} & 1 \end{pmatrix}\]
The \(\tau\) parameters are the
group-level scale parameters, which constrain the spread of effects for
each response, and \(\chi\) comprises
the group-level residual correlation matrix:
\[\chi = \begin{pmatrix} 1 & \varrho
\\ \varrho & 1 \end{pmatrix}\]
wherein \(\varrho\) is the
group-level residual correlation between responses.
The explicit indexing in the notation above conveys the relationship
between the parameters and each observation, \(i\), and emphasizes the nested structure of
the observations within the group effects. Specifically, every
observation, \(i\), was nested in a
lake size category, \(l\), which was
nested in a state, \(k\), and
ecoregion, \(j\). The parameter \(\alpha_1\), therefore, accounted for
ecoregion-scale group effects or deviations from the global mean; \(\alpha_2\) accounted for state-level group
effects nested in ecoregions; and \(\alpha_3\) accounted for lake size group
effects within states and ecoregions.
Finally, the observation-level covariance term, \(\Sigma\), was parameterized as: \[\Sigma = \begin{pmatrix} 1 & \sigma_{p=1} \\
\sigma_{p=2} & 1 \end{pmatrix} \Omega \begin{pmatrix} 1 &
\sigma_{p=1} \\ \sigma_{p=2} & 1 \end{pmatrix}\]
wherein the \(\sigma\) parameters
are the observation-level standard deviations for each response and
\(\Omega\) comprises the
observation-level residual correlation matrix: \[\Omega = \begin{pmatrix} 1 & \rho \\ \rho
& 1 \end{pmatrix}\] wherein \(\rho\) is the residual correlation between
responses.
For model fitting, priors were needed for all parameters conditioned
directly on the data, which included the global intercept, the scale
parameters, and the correlation matrices. A normal or Gaussian prior,
\(N(\mu = 2, \sigma = 1)\) centered
near the (log-scale) data means, was used for the global intercept
parameter for each response. This prior was considered minimally
informative as it placed most (~80%) of the prior mass over values
between approximately 2 and 27 ng/L for median N2O or N2O equilibrium
concentration and included support in the tails for values approaching 0
ng/L on the lower end and 80 ng/L on the high end. We placed \(Exp(2)\) priors over all scale parameters,
which placed most of the support between values very close to 0 and
values near 1 (central 80% density interval from approximately 0.005 to
1.15). Finally, for the correlation matrices, an \(LKJ(\eta =2)\) prior was used, which, for a
2-dimensional response, placed most support for correlations between
approximately -0.9 and 0.9. This prior seemed reasonable as there was no
clear causal mechanisms that were thought to ensure a strong direct
correlation between the N2O measures. Any potential residual dependence
was expected to be indirect due to, for example, a common causal factor
(e.g., elevation, temperature). For more information on prior choice
recommendations in Stan, see: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
The \(\textbf{brms}\) package (Bürkner 2017) for \(\textbf{R}\) (R Core Team
2021) was used to fit all of the models in a fully Bayesian
setting. The formula syntax of the \(\textbf{brms}\) package is similar to the
syntax used in the \(\textbf{lme4}\)
package that is widely used to fit mixed effects models in frequentist
settings In either package, the linear predictor for \(\mu\) described above could be expressed
as:
\[\sim 1 + (1|WSA9) + (1|WSA9:state) +
(1|WSA9:state:size)\]
In the \(\textbf{brms}\) package,
there is additionaly functionality and syntax for multivariate responses
and for allowing the varying intercepts in a multivariate model to be
correlated, e.g.,:
\[ N_2O_{dissolved}\sim 1 + (1|a|WSA9) +
(1|b|WSA9:state) + (1|c|WSA9:state:size) \\
N_2O_{equilibrium}\sim 1 + (1|a|WSA9) + (1|b|WSA9:state) +
(1|c|WSA9:state:size)\]
The above syntax would indicate that the linear predictor for both
responses in the multivariate model have the same group-level varying
effects, and that each of those effects are allowed to be correlated
between responses.
For the remainder of this document, only this simplified syntax is
presented to describe the model parameterizations. For more information
on \(\textbf{brms}\) functionality and
syntax with multivariate response models, the package vignette may be
helpful, and can be found at: https://cran.r-project.org/web/packages/brms/vignettes/brms_multivariate.html.
Model 1
The first model fit was the one described above.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
bf_n2o <- bf(log(n2o) ~ 1 +
(1 | a | WSA9) +
(1 | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
family = gaussian())
bf_n2oeq <- bf(log(n2o_eq) ~ 1 +
(1 | a | WSA9) +
(1 | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
family = gaussian())
priors <- c(
prior(normal(2, 1), class = "Intercept", resp = "logn2o"), # centered near data mean
prior(exponential(2), class = "sd", resp = "logn2o"),
prior(exponential(2), class = "sigma", resp = "logn2o"),
prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), # centered near data mean
prior(exponential(2), class = "sd", resp = "logn2oeq"),
prior(exponential(2), class = "sigma", resp = "logn2oeq"),
prior(lkj(2), class = "rescor"),
prior(lkj(2), class = "cor")
)
n2o_mod1 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
data = df_model,
prior = priors,
control = list(adapt_delta = 0.99, max_treedepth = 14),
#sample_prior = "only",
save_pars = save_pars(all = TRUE),
seed = 145,
chains=4,
iter=5000,
cores=4)
save(n2o_mod1, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
Summarize fit
The summaries of the estimated parameters and key HMC convergence
diagnostics for the fitted model are printed below. There were no
obvious issues with the HMC sampling. All \(\hat{R}\) values were less than 1.01 and
effective sample size (\(ESS\))
calculations suggested that the posterior contained a sufficient number
of effective samples for conducting inference.
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = identity
mu = identity; sigma = identity
Formula: log(n2o) ~ 1 + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
log(n2o_eq) ~ 1 + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
Data: df_model (Number of observations: 984)
Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 10000
Priors:
Intercept_logn2o ~ normal(2, 1)
Intercept_logn2oeq ~ normal(2, 1)
L ~ lkj_corr_cholesky(2)
Lrescor ~ lkj_corr_cholesky(2)
<lower=0> sd_logn2o ~ exponential(2)
<lower=0> sd_logn2oeq ~ exponential(2)
<lower=0> sigma_logn2o ~ exponential(2)
<lower=0> sigma_logn2oeq ~ exponential(2)
Group-Level Effects:
~WSA9 (Number of levels: 9)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.04 0.03 0.00 0.13 1.00
sd(logn2oeq_Intercept) 0.06 0.02 0.03 0.11 1.00
cor(logn2o_Intercept,logn2oeq_Intercept) 0.05 0.44 -0.80 0.83 1.01
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 2475 4230
sd(logn2oeq_Intercept) 3326 5332
cor(logn2o_Intercept,logn2oeq_Intercept) 1141 3129
~WSA9:state (Number of levels: 96)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.26 0.03 0.21 0.32 1.00
sd(logn2oeq_Intercept) 0.04 0.01 0.03 0.05 1.00
cor(logn2o_Intercept,logn2oeq_Intercept) 0.24 0.15 -0.07 0.52 1.00
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 3199 4775
sd(logn2oeq_Intercept) 3549 5476
cor(logn2o_Intercept,logn2oeq_Intercept) 3332 5553
~WSA9:state:size_cat (Number of levels: 352)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.09 0.04 0.01 0.16 1.00
sd(logn2oeq_Intercept) 0.02 0.01 0.00 0.03 1.00
cor(logn2o_Intercept,logn2oeq_Intercept) -0.22 0.37 -0.85 0.58 1.00
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 863 1397
sd(logn2oeq_Intercept) 928 1365
cor(logn2o_Intercept,logn2oeq_Intercept) 1661 2807
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_Intercept 2.02 0.04 1.95 2.09 1.00 2641 3963
logn2oeq_Intercept 2.00 0.02 1.96 2.04 1.00 3446 4550
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_logn2o 0.40 0.01 0.37 0.42 1.00 2749 5523
sigma_logn2oeq 0.08 0.00 0.08 0.09 1.00 4790 6698
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(logn2o,logn2oeq) 0.21 0.04 0.14 0.27 1.00 6172 7243
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
In the summary above, the estimated standard deviations for the
varying group effects on the mean behavior of the dissolved N2O response
suggested fairly low, but non-zero variability across each of the three
levels. The standard deviations estimated for the same varying effects
for equilibrium N2O were also relatively small. Finally, note the
relatively small, but positive residual correlation between the two N2O
responses.
Before investing too much into the interpretation of this model,
however, the model fit was evaluated below using a series of graphical
posterior predictive checks [PPC; Gelman et al.
(2014);
Gelman, Hill, and Vehtari (2020), Ch.
11].
Model checks
Dissolved
N2O
Below are a series of panels illustrating graphical PPCs for the
log(N2O) component of the model. The top left panel compares a density
plot of the observed data (black line) to density lines drawn for 200
samples from the posterior predictive distribution (PPD; blue lines) of
the fitted model. The top right panel similarly compares the cumulative
density distributions. The left middle panel simulataneously compares
means vs. standard deviations for 1000 draws from the PPD (blue
dots) to the sample mean and standard deviation (black dot). The right
middle panel compares skewness vs. kurtosis for 1000 draws from
the PPD to the skewness and kurtosis values calculated for the observed
data. The bottom left panel compares max vs. min values for
1000 draws from the PPD to the max and min values of the sample data.
Finally, the bottom right panel shows the observed vs. average
predicted values for each observation in the sample. The average
predicted values were calculated as the mean prediction for each
observation in the PPD based on 1000 draws.

The general takeaway from the PPCs above was that the model
replicated the central tendency of the observed data fairly well, but
failed to sufficiently replicate other important aspects of the
distribution, such as skewness and kurtosis. The observed vs.
average predictions scatterplot suggested substantial heteroscedasticity
in the errors.
The same checks were run below, but for the test set of 95 held-out,
second-visit data points.

The patterns in misfit indicated above for the re-visit data were
similar to the patterns indicated in the PPCs with the training
data.
Equilibrium
N2O
Below are PPCs for the equilibrium N2O component of the model. As
with the dissolved N2O response above, the model did an OK job at
replicating the central tendency, but performed less well at replicating
some important aspects of the overall distribution.

Below are the same PPCs for equilibrium N2O in the re-visit
sites.

Bivariate
The graphical check below compares bivariate density contours
estimated from the observed data (black lines) to density contours
estimated for each of 20 draws from the PPD. The model appeared to do a
good job of replicating the bivariate mean, but was poor at representing
the overall joint distribution.

The same bivariate check is shown below for the re-visit data.

Saturation
The graphical PPCs below were aimed at evaluating how well the
multivariate model did at representing the observed saturation ratio:
\[N_2O_{dissolved}:N_2O_{equilibrium}\] This
quantity was estimated as a derived variable by simply dividing the N2O
PPD by the equilibrium N2O PPD. Likewise, the proportion of
under-saturated lakes in the sample was estimated by summing the number
of lakes from each posterior predictive draw wherein the ratio was <
1 and dividing that number by the total number of lakes in the sample,
which was 984. Overall, these checks indicated that properly
representing the tails of the N2O and N2O-eq observations would likely
be necessary in order to better replicate the observed saturation
metrics. For example, the model did a poor job replicating the observed
proportion of under-saturated lakes, underestimating it by more than 10
percentage points, on average.

The top left panel, above, is a density plot of the observed
saturation ratio (black line) compared to an estimate using 50 draws
from the model (blue lines). The top right panel shows the observed
proportion of under-saturated lakes compared to a model estimate based
on 1000 draws from the PPD. The left middle panel shows the mean
vs. standard deviation of the saturation ratio for the observed
data compared to the same estimates for 500 posterior draws from the
model’s PPD. The right middle panel shows the max vs. min for
the sample compared to 500 draws from the model’s PPD. Finally, the
bottom left panel shows the observed vs. average predicted
saturation ratio for all 984 lakes sampled in the dataset.
The same PPCs are show below for the revisit data.

The checks above indicated that the model did a similarly
underwhelming job of replicating some key properties of the saturation
metrics calculated from the re-visit data.
R-square
Below, the Bayesian \(R^2\) values
are reported for each reasponse in the model.
Estimate Est.Error Q2.5 Q97.5
R2logn2o 0.247 0.031 0.187 0.309
Estimate Est.Error Q2.5 Q97.5
R2logn2oeq 0.377 0.025 0.328 0.425
The \(R^2\) were also estimated for
the re-visit data.
Estimate Est.Error Q2.5 Q97.5
R2logn2o 0.413 0.04 0.331 0.489
Estimate Est.Error Q2.5 Q97.5
R2logn2oeq 0.322 0.026 0.271 0.374
Model 2
In an attempt to better fit the observed data, the next model
included distributional sub-models to allow for heterogeneous variances
for each response conditional on the survey design structure.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
bf_n2o <- bf(log(n2o) ~ 1 +
(1 | a | WSA9) +
(1 | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
sigma ~ 1 +
(1 | WSA9) +
(1 | WSA9:state) +
(1 | WSA9:state:size_cat),
family = gaussian())
bf_n2oeq <- bf(log(n2o_eq) ~ 1 +
(1 | a | WSA9) +
(1 | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
sigma ~ 1 +
(1 | WSA9) +
(1 | WSA9:state) +
(1 | WSA9:state:size_cat),
family = gaussian())
priors <- c(
prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
prior(exponential(2), class = "sd", resp = "logn2o"),
prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"),
prior(exponential(2), class = "sd", resp = "logn2oeq"),
prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
prior(lkj(2), class = "rescor"),
prior(lkj(2), class = "cor")
)
n2o_mod2 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
data = df_model,
prior = priors,
control = list(adapt_delta = 0.975, max_treedepth = 12),
#sample_prior = "only",
save_pars = save_pars(all = TRUE),
seed = 84512,
chains=4,
iter=5000,
cores=4)
save(n2o_mod2, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
Summarize fit
The summaries of the estimated parameters and key HMC convergence
diagnostics for the fitted model are printed below.
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = log
mu = identity; sigma = log
Formula: log(n2o) ~ 1 + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
sigma ~ 1 + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
log(n2o_eq) ~ 1 + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
sigma ~ 1 + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
Data: df_model (Number of observations: 984)
Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 10000
Priors:
Intercept_logn2o ~ normal(2, 1)
Intercept_logn2o_sigma ~ normal(-1, 2)
Intercept_logn2oeq ~ normal(2, 1)
Intercept_logn2oeq_sigma ~ normal(-1, 2)
L ~ lkj_corr_cholesky(2)
Lrescor ~ lkj_corr_cholesky(2)
<lower=0> sd_logn2o ~ exponential(2)
<lower=0> sd_logn2o_sigma ~ exponential(2)
<lower=0> sd_logn2oeq ~ exponential(2)
<lower=0> sd_logn2oeq_sigma ~ exponential(2)
Group-Level Effects:
~WSA9 (Number of levels: 9)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.06 0.03 0.02 0.12 1.00
sd(logn2oeq_Intercept) 0.05 0.02 0.03 0.09 1.00
sd(sigma_logn2o_Intercept) 0.22 0.12 0.02 0.50 1.00
sd(sigma_logn2oeq_Intercept) 0.16 0.08 0.04 0.34 1.00
cor(logn2o_Intercept,logn2oeq_Intercept) 0.57 0.30 -0.18 0.95 1.00
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 2247 1772
sd(logn2oeq_Intercept) 3423 4761
sd(sigma_logn2o_Intercept) 1892 3094
sd(sigma_logn2oeq_Intercept) 2413 1991
cor(logn2o_Intercept,logn2oeq_Intercept) 2696 3538
~WSA9:state (Number of levels: 96)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.08 0.02 0.04 0.11 1.00
sd(logn2oeq_Intercept) 0.04 0.01 0.03 0.05 1.00
sd(sigma_logn2o_Intercept) 0.56 0.08 0.40 0.74 1.00
sd(sigma_logn2oeq_Intercept) 0.22 0.05 0.11 0.32 1.00
cor(logn2o_Intercept,logn2oeq_Intercept) 0.42 0.19 -0.00 0.74 1.00
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 1276 1733
sd(logn2oeq_Intercept) 2695 5090
sd(sigma_logn2o_Intercept) 2007 3462
sd(sigma_logn2oeq_Intercept) 1851 1881
cor(logn2o_Intercept,logn2oeq_Intercept) 1076 1960
~WSA9:state:size_cat (Number of levels: 352)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.06 0.01 0.03 0.09 1.00
sd(logn2oeq_Intercept) 0.02 0.01 0.00 0.03 1.00
sd(sigma_logn2o_Intercept) 0.61 0.05 0.51 0.72 1.00
sd(sigma_logn2oeq_Intercept) 0.19 0.06 0.07 0.29 1.00
cor(logn2o_Intercept,logn2oeq_Intercept) -0.00 0.34 -0.70 0.62 1.00
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 1329 2182
sd(logn2oeq_Intercept) 1052 1452
sd(sigma_logn2o_Intercept) 2622 4939
sd(sigma_logn2oeq_Intercept) 1603 1725
cor(logn2o_Intercept,logn2oeq_Intercept) 996 2063
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_Intercept 1.93 0.03 1.88 1.99 1.00 3338 4462
sigma_logn2o_Intercept -1.40 0.12 -1.63 -1.17 1.00 3511 5077
logn2oeq_Intercept 2.00 0.02 1.96 2.03 1.00 3384 4145
sigma_logn2oeq_Intercept -2.60 0.07 -2.75 -2.45 1.00 4004 4848
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(logn2o,logn2oeq) 0.36 0.03 0.29 0.43 1.00 6474 7733
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
From the summary above, note the moderate and positive residual
correlation between the two N2O responses. The estimated standard
deviations for the varying group effects on the mean behavior of the
dissolved N2O response suggested fairly low, but non-zero variability
across each of the three levels. The standard deviations estimated for
the same varying effects for equilibrium N2O were also relatively small.
However, before investing too much into the interpretation of these
results, the model fit was further evaluated below using a series of
graphical posterior predictive checks (PPCs).
Model checks
Below the same PPCs were performed as with the initial model (see
above for more details on each panel). ##### Dissolved N2O Though the
checks below suggest some improvement in replicating the tails of the
observed data, this model did a poorer job at replicating central
tendency.

Equilibrium
N2O
The checks below suggest this model offered no improvement upon the
initial model for equilibrium N2O. This model also appeared to do a
poorer job of replicating the mean and overall standard deviation
compared to the initial model.

Bivariate
This check perhaps suggested an improvement with regard to
replicating the joint density. However, the predictions were still
clearly over-dispersed relative to the observations.

Saturation
The PPCs for the saturation metrics below indicated that including
the distributional models was perhaps an improvement on the initial
model in some aspects; in particular, the bias in the predicted
proportion of under-saturated lakes was substantially decreased.
However, there appeared to still be issues in replicating the tails as
well as issues with central tendency.

R-square
Relative to model 1, there was a substantial decrease in the \(R^2\) estimate for the dissolved N2O
component of this model. The estimate for the equilibrium N2O-eq
component was similar to the model 1.
Estimate Est.Error Q2.5 Q97.5
R2logn2o 0.056 0.009 0.038 0.075
Estimate Est.Error Q2.5 Q97.5
R2logn2oeq 0.379 0.022 0.335 0.421
Model 3
In the next model, we used covariates to try to improve the fit. The
categorical version of the NO3 covariate was used as a monotonic ordinal
predictor in the dissolved N2O component of the modl. For the equlibrium
N2O component, we included surface temperature and log-transformed
elevation, along with their interaction. The models also retained the
distributional specifications included in model 2 above.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
surftemp +
(mo(no3_cat) | a | WSA9) +
(mo(no3_cat) | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
sigma ~ 1 +
(1 | WSA9) +
(1 | WSA9:state) +
(1 | WSA9:state:size_cat),
family = gaussian())
bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
log_elev +
surftemp:log_elev +
(1 | a | WSA9) +
(1 | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
sigma ~ 1 +
(1 | WSA9) +
(1 | WSA9:state) +
(1 | WSA9:state:size_cat),
family = gaussian())
priors <- c(
prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
prior(normal(0, 1), class = "b", resp = "logn2o"),
prior(exponential(2), class = "sd", resp = "logn2o"),
prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"),
prior(normal(0, 1), class = "b", resp = "logn2oeq"),
prior(exponential(2), class = "sd", resp = "logn2oeq"),
prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
prior(lkj(2), class = "rescor"),
prior(lkj(2), class = "cor")
)
n2o_mod3 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
data = df_model,
prior = priors,
control = list(adapt_delta = 0.975, max_treedepth = 12),
#sample_prior = "only",
save_pars = save_pars(all = TRUE),
seed = 98456,
chains=4,
iter=5000,
cores=4)
save(n2o_mod3, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
Summarize fit
The fitted parameters and MCMC diagnostics are below.
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = log
mu = identity; sigma = log
Formula: log(n2o) ~ mo(no3_cat) + surftemp + (mo(no3_cat) | a | WSA9) + (mo(no3_cat) | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
sigma ~ 1 + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
log(n2o_eq) ~ surftemp + log_elev + surftemp:log_elev + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
sigma ~ 1 + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
Data: df_model (Number of observations: 984)
Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 10000
Priors:
b_logn2o ~ normal(0, 1)
b_logn2oeq ~ normal(0, 1)
Intercept_logn2o ~ normal(2, 1)
Intercept_logn2o_sigma ~ normal(-1, 2)
Intercept_logn2oeq ~ normal(2, 1)
Intercept_logn2oeq_sigma ~ normal(-1, 2)
L ~ lkj_corr_cholesky(2)
Lrescor ~ lkj_corr_cholesky(2)
<lower=0> sd_logn2o ~ exponential(2)
<lower=0> sd_logn2o_sigma ~ exponential(2)
<lower=0> sd_logn2oeq ~ exponential(2)
<lower=0> sd_logn2oeq_sigma ~ exponential(2)
simo_logn2o_mono3_cat1 ~ dirichlet(1)
Group-Level Effects:
~WSA9 (Number of levels: 9)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.05 0.02 0.02 0.10 1.00
sd(logn2o_mono3_cat) 0.14 0.05 0.06 0.26 1.00
sd(logn2oeq_Intercept) 0.04 0.01 0.02 0.07 1.00
sd(sigma_logn2o_Intercept) 0.12 0.08 0.01 0.31 1.00
sd(sigma_logn2oeq_Intercept) 0.36 0.11 0.19 0.64 1.00
cor(logn2o_Intercept,logn2o_mono3_cat) -0.14 0.33 -0.72 0.53 1.00
cor(logn2o_Intercept,logn2oeq_Intercept) 0.36 0.29 -0.28 0.83 1.00
cor(logn2o_mono3_cat,logn2oeq_Intercept) 0.37 0.29 -0.26 0.82 1.00
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 4265 3917
sd(logn2o_mono3_cat) 4747 5482
sd(logn2oeq_Intercept) 4798 5704
sd(sigma_logn2o_Intercept) 3419 4599
sd(sigma_logn2oeq_Intercept) 5027 6638
cor(logn2o_Intercept,logn2o_mono3_cat) 4970 6018
cor(logn2o_Intercept,logn2oeq_Intercept) 5073 6636
cor(logn2o_mono3_cat,logn2oeq_Intercept) 7256 7538
~WSA9:state (Number of levels: 96)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.03 0.02 0.00 0.07 1.00
sd(logn2o_mono3_cat) 0.14 0.02 0.10 0.18 1.00
sd(logn2oeq_Intercept) 0.03 0.00 0.03 0.04 1.00
sd(sigma_logn2o_Intercept) 0.30 0.09 0.09 0.47 1.00
sd(sigma_logn2oeq_Intercept) 0.28 0.06 0.17 0.40 1.00
cor(logn2o_Intercept,logn2o_mono3_cat) -0.30 0.32 -0.82 0.45 1.00
cor(logn2o_Intercept,logn2oeq_Intercept) 0.00 0.27 -0.53 0.55 1.01
cor(logn2o_mono3_cat,logn2oeq_Intercept) 0.45 0.13 0.16 0.69 1.00
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 894 2100
sd(logn2o_mono3_cat) 3795 5396
sd(logn2oeq_Intercept) 4164 5957
sd(sigma_logn2o_Intercept) 1157 1279
sd(sigma_logn2oeq_Intercept) 2747 4313
cor(logn2o_Intercept,logn2o_mono3_cat) 550 693
cor(logn2o_Intercept,logn2oeq_Intercept) 596 971
cor(logn2o_mono3_cat,logn2oeq_Intercept) 2536 4431
~WSA9:state:size_cat (Number of levels: 352)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.06 0.01 0.04 0.08 1.00
sd(logn2oeq_Intercept) 0.00 0.00 0.00 0.01 1.00
sd(sigma_logn2o_Intercept) 0.58 0.06 0.47 0.70 1.00
sd(sigma_logn2oeq_Intercept) 0.28 0.05 0.18 0.39 1.00
cor(logn2o_Intercept,logn2oeq_Intercept) 0.26 0.39 -0.60 0.87 1.00
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 1429 2386
sd(logn2oeq_Intercept) 1779 3895
sd(sigma_logn2o_Intercept) 2003 4421
sd(sigma_logn2oeq_Intercept) 1812 3745
cor(logn2o_Intercept,logn2oeq_Intercept) 4523 5554
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_Intercept 2.40 0.05 2.29 2.50 1.00 5046 6296
sigma_logn2o_Intercept -1.70 0.08 -1.87 -1.55 1.00 5162 5452
logn2oeq_Intercept 3.10 0.05 3.01 3.19 1.00 8903 7808
sigma_logn2oeq_Intercept -3.54 0.14 -3.81 -3.27 1.00 4065 5338
logn2o_surftemp -0.02 0.00 -0.03 -0.02 1.00 5529 6684
logn2oeq_surftemp -0.04 0.00 -0.04 -0.04 1.00 9722 7658
logn2oeq_log_elev -0.07 0.01 -0.09 -0.06 1.00 9200 7505
logn2oeq_surftemp:log_elev 0.00 0.00 0.00 0.00 1.00 9527 8028
logn2o_mono3_cat 0.23 0.05 0.12 0.34 1.00 3948 4777
Simplex Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_mono3_cat1[1] 0.02 0.01 0.00 0.04 1.00 5535 5409
logn2o_mono3_cat1[2] 0.09 0.02 0.04 0.13 1.00 4281 5251
logn2o_mono3_cat1[3] 0.21 0.05 0.13 0.32 1.00 3683 4649
logn2o_mono3_cat1[4] 0.69 0.05 0.58 0.77 1.00 3498 4251
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(logn2o,logn2oeq) 0.15 0.04 0.07 0.23 1.00 11381 7610
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Model checks
Dissolved
N2O
The PPCs below indicated a better fit compared to the previous
models. The central tendency and tail behavior looked to be reasonably
replicated by comparison. However, the observed vs. predicted
plot suggested that larger overserved values were being systematically
underestimated.

Equilibrium
N2O
The PPCs below indicated that this model appeared to be an
improvement for equilibrium N2O as well. However, some checks (e.g.,
skewness) suggested some room for additional improvement.

Bivariate
The check for the joint distribution below also suggested an
improvement up the previous models.

Saturation
This model looked to be an improvement with regard to the PPCs for
the saturation metrics. However, the proportion of under-saturated lakes
remained biased low and other checks indicated that further improvements
would be ideal.

R-square
The \(R^2\) estimates for this model
are below and suggested substantial improvements on the previous
models.
Estimate Est.Error Q2.5 Q97.5
R2logn2o 0.626 0.017 0.591 0.66
Estimate Est.Error Q2.5 Q97.5
R2logn2oeq 0.879 0.004 0.87 0.886
Covariate
effects
Below are plots illustrating the modeled effects of covariates on
both N2O and equilibrium N2O. #### N2O The conditional effects plots
below for N2O illustrate a positive, monotonic, and non-linear
relationship between NO3 and N2O; and a negative, linear relationship
between surface temperature and N2O.

Equilibrium
N2O
The modeled effects below for the equilibrium N2O component of the
model illustrated a negative relationship between equilibrium N2O and
both predictors and an interaction such that the surface temperature
effect became slightly steeper at lower elevations.

Model 4
In the next model, covariate terms were also included in the \(\sigma\) components of both models in order
to try to better capture remaining heterogeneity in the variances of
both N2O and N2O-eq.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
surftemp +
(mo(no3_cat) | a | WSA9) +
(mo(no3_cat) | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
sigma ~ mo(no3_cat) +
surftemp +
(1 | WSA9) +
(1 | WSA9:state) +
(1 | WSA9:state:size_cat),
family = gaussian())
bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
log_elev +
surftemp:log_elev +
(1 | a | WSA9) +
(1 | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
sigma ~ surftemp +
log_elev +
(1 | WSA9) +
(1 | WSA9:state) +
(1 | WSA9:state:size_cat),
family = gaussian())
priors <- c(
prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
prior(normal(0, 1), class = "b", resp = "logn2o"),
prior(exponential(2), class = "sd", resp = "logn2o"),
prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2o"),
prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"),
prior(normal(0, 1), class = "b", resp = "logn2oeq"),
prior(exponential(2), class = "sd", resp = "logn2oeq"),
prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2oeq"),
prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
prior(lkj(2), class = "rescor"),
prior(lkj(2), class = "cor")
)
n2o_mod4 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
data = df_model,
prior = priors,
control = list(adapt_delta = 0.975, max_treedepth = 12),
#sample_prior = "only",
save_pars = save_pars(all = TRUE),
seed = 15851,
chains=4,
iter=5000,
cores=4)
save(n2o_mod4, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
Summarize fit
Below is a summary of the fitted parameters along with some
convergence diagnostics.
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = log
mu = identity; sigma = log
Formula: log(n2o) ~ mo(no3_cat) + surftemp + (mo(no3_cat) | a | WSA9) + (mo(no3_cat) | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
sigma ~ mo(no3_cat) + surftemp + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
log(n2o_eq) ~ surftemp + log_elev + surftemp:log_elev + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
sigma ~ surftemp + log_elev + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
Data: df_model (Number of observations: 984)
Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 10000
Priors:
b_logn2o ~ normal(0, 1)
b_logn2o_sigma ~ normal(0, 1)
b_logn2oeq ~ normal(0, 1)
b_logn2oeq_sigma ~ normal(0, 1)
Intercept_logn2o ~ normal(2, 1)
Intercept_logn2o_sigma ~ normal(-1, 2)
Intercept_logn2oeq ~ normal(2, 1)
Intercept_logn2oeq_sigma ~ normal(-1, 2)
L ~ lkj_corr_cholesky(2)
Lrescor ~ lkj_corr_cholesky(2)
<lower=0> sd_logn2o ~ exponential(2)
<lower=0> sd_logn2o_sigma ~ exponential(2)
<lower=0> sd_logn2oeq ~ exponential(2)
<lower=0> sd_logn2oeq_sigma ~ exponential(2)
simo_logn2o_mono3_cat1 ~ dirichlet(1)
simo_logn2o_sigma_mono3_cat1 ~ dirichlet(1)
Group-Level Effects:
~WSA9 (Number of levels: 9)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.050 0.020 0.020 0.100 1.000
sd(logn2o_mono3_cat) 0.145 0.054 0.065 0.272 1.000
sd(logn2oeq_Intercept) 0.036 0.012 0.019 0.065 1.001
sd(sigma_logn2o_Intercept) 0.113 0.080 0.005 0.303 1.001
sd(sigma_logn2oeq_Intercept) 0.208 0.098 0.039 0.435 1.001
cor(logn2o_Intercept,logn2o_mono3_cat) -0.186 0.327 -0.755 0.489 1.000
cor(logn2o_Intercept,logn2oeq_Intercept) 0.344 0.290 -0.277 0.817 1.000
cor(logn2o_mono3_cat,logn2oeq_Intercept) 0.364 0.286 -0.269 0.819 1.001
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 4733 4348
sd(logn2o_mono3_cat) 4386 6208
sd(logn2oeq_Intercept) 3977 5215
sd(sigma_logn2o_Intercept) 2543 4693
sd(sigma_logn2oeq_Intercept) 2777 2177
cor(logn2o_Intercept,logn2o_mono3_cat) 4616 6092
cor(logn2o_Intercept,logn2oeq_Intercept) 5667 6074
cor(logn2o_mono3_cat,logn2oeq_Intercept) 7634 7624
~WSA9:state (Number of levels: 96)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.035 0.017 0.003 0.068 1.001
sd(logn2o_mono3_cat) 0.117 0.022 0.076 0.162 1.001
sd(logn2oeq_Intercept) 0.033 0.003 0.027 0.040 1.000
sd(sigma_logn2o_Intercept) 0.181 0.099 0.011 0.374 1.004
sd(sigma_logn2oeq_Intercept) 0.287 0.057 0.177 0.403 1.001
cor(logn2o_Intercept,logn2o_mono3_cat) -0.317 0.311 -0.810 0.405 1.003
cor(logn2o_Intercept,logn2oeq_Intercept) 0.025 0.265 -0.497 0.557 1.005
cor(logn2o_mono3_cat,logn2oeq_Intercept) 0.448 0.165 0.098 0.738 1.001
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 981 2017
sd(logn2o_mono3_cat) 3149 4612
sd(logn2oeq_Intercept) 3457 6360
sd(sigma_logn2o_Intercept) 858 2373
sd(sigma_logn2oeq_Intercept) 2470 3862
cor(logn2o_Intercept,logn2o_mono3_cat) 945 1286
cor(logn2o_Intercept,logn2oeq_Intercept) 556 823
cor(logn2o_mono3_cat,logn2oeq_Intercept) 1518 3021
~WSA9:state:size_cat (Number of levels: 352)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.065 0.011 0.042 0.087 1.003
sd(logn2oeq_Intercept) 0.004 0.002 0.000 0.008 1.001
sd(sigma_logn2o_Intercept) 0.539 0.055 0.432 0.647 1.002
sd(sigma_logn2oeq_Intercept) 0.263 0.051 0.162 0.363 1.001
cor(logn2o_Intercept,logn2oeq_Intercept) 0.395 0.348 -0.479 0.904 1.001
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 1478 2832
sd(logn2oeq_Intercept) 1506 2453
sd(sigma_logn2o_Intercept) 1443 3611
sd(sigma_logn2oeq_Intercept) 2372 4366
cor(logn2o_Intercept,logn2oeq_Intercept) 4231 4956
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_Intercept 2.386 0.055 2.278 2.490 1.000 5971 7057
sigma_logn2o_Intercept -1.855 0.281 -2.389 -1.294 1.000 5965 7704
logn2oeq_Intercept 3.115 0.051 3.016 3.217 1.001 7109 7366
sigma_logn2oeq_Intercept -1.922 0.373 -2.634 -1.180 1.000 8445 7581
logn2o_surftemp -0.021 0.002 -0.026 -0.017 1.000 5850 7677
sigma_logn2o_surftemp -0.001 0.011 -0.023 0.021 1.001 6175 7768
logn2oeq_surftemp -0.042 0.002 -0.046 -0.039 1.001 8285 7516
logn2oeq_log_elev -0.080 0.008 -0.096 -0.065 1.001 7656 7564
logn2oeq_surftemp:log_elev 0.002 0.000 0.002 0.003 1.000 8407 7502
sigma_logn2oeq_surftemp -0.065 0.010 -0.085 -0.046 1.000 10043 8163
sigma_logn2oeq_log_elev -0.019 0.038 -0.095 0.054 1.000 6646 7910
logn2o_mono3_cat 0.225 0.058 0.108 0.340 1.000 4488 5143
sigma_logn2o_mono3_cat 0.256 0.037 0.187 0.331 1.000 6036 7643
Simplex Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_mono3_cat1[1] 0.018 0.012 0.001 0.045 1.000 5709 4429
logn2o_mono3_cat1[2] 0.093 0.027 0.041 0.149 1.000 5034 5645
logn2o_mono3_cat1[3] 0.231 0.060 0.126 0.366 1.000 4974 5478
logn2o_mono3_cat1[4] 0.659 0.062 0.520 0.763 1.000 4446 5210
sigma_logn2o_mono3_cat1[1] 0.107 0.066 0.007 0.255 1.000 6168 4471
sigma_logn2o_mono3_cat1[2] 0.129 0.087 0.006 0.328 1.000 8105 5010
sigma_logn2o_mono3_cat1[3] 0.459 0.151 0.168 0.758 1.000 7272 5802
sigma_logn2o_mono3_cat1[4] 0.304 0.139 0.038 0.569 1.000 6289 4264
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(logn2o,logn2oeq) 0.146 0.040 0.067 0.223 1.001 9839 8508
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Model checks
Again, the same PPCs were employed for this model as above. ####
Dissolved N2O Again, this model appeared to be an improvement on the
previous model, particularly with regard to the more constant variance
indicated in the observed vs. predicted plot (bottom, right
panel).
Using all posterior draws for ppc type 'scatter_avg' by default.

Equilibrium
N2O
This component of the model also seemed to be an improvement over
model 3, with better representation in the tails as indicated in the
skewness vs. kurtosis PPC.

Bivariate
Again, an improvement over the previous model with a tighter fit of
the PPC to the observed bivariate density.

Saturation
This check also suggested an improvement over the previous models,
with better tail behavior and less bias in the proportion
under-saturated measure.

R-square
The Bayesian \(R^2\) estimates below
indicated an improvement from the previous models.
Estimate Est.Error Q2.5 Q97.5
R2logn2o 0.606 0.025 0.551 0.651
Estimate Est.Error Q2.5 Q97.5
R2logn2oeq 0.875 0.005 0.865 0.883
Covariate
effects
N2O
The conditional effects plots for the covariate effects on N2O
remained largely unchanged from the previous model.

Below are estimates of the conditional effects of the covariates on
\(\sigma\) for N2O. These plots
suggested a large effect of NO3 on the variance of N2O, but little to no
effect of surface temperature.

Equilibrium
N2O
The covariate effects on N2O remained largely the same as for the
previous model.

The covariate effects on \(\sigma\)
for N2O-eq suggested an negative effect of surface temperature and litte
to no effect of elevation.

Model 5
In the next model, more complexity is added to the N2O component by
including a covariate for lake surface area (log scale) as well as
interactions between NO3 and log(area) and surface temperature.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
log_area +
surftemp +
mo(no3_cat):log_area +
mo(no3_cat):surftemp +
(mo(no3_cat) | a | WSA9) +
(mo(no3_cat) | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
sigma ~ log_area +
mo(no3_cat) +
(1 | WSA9) +
(1 | WSA9:state) +
(1 | WSA9:state:size_cat),
family = gaussian())
bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
log_elev +
surftemp:log_elev +
(1 | a | WSA9) +
(1 | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
sigma ~ surftemp +
log_elev +
(1 | WSA9) +
(1 | WSA9:state) +
(1 | WSA9:state:size_cat),
family = gaussian())
priors <- c(
prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
prior(normal(0, 1), class = "b", resp = "logn2o"),
prior(exponential(2), class = "sd", resp = "logn2o"),
prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2o"),
prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"),
prior(normal(0, 1), class = "b", resp = "logn2oeq"),
prior(exponential(2), class = "sd", resp = "logn2oeq"),
prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2oeq"),
prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
prior(lkj(2), class = "rescor"),
prior(lkj(2), class = "cor")
)
n2o_mod5 <- brm(bf_n2o +
bf_n2oeq +
set_rescor(rescor = TRUE),
data = df_model,
prior = priors,
control = list(adapt_delta = 0.975, max_treedepth = 12),
#sample_prior = "only",
save_pars = save_pars(all = TRUE),
seed = 54741,
chains=4,
iter=5000,
cores=4)
save(n2o_mod5, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
Summarize fit
Below is a summary of the fitted parameters along with MCMC
convergence diagnostics.
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = log
mu = identity; sigma = log
Formula: log(n2o) ~ mo(no3_cat) + log_area + surftemp + mo(no3_cat):log_area + mo(no3_cat):surftemp + (mo(no3_cat) | a | WSA9) + (mo(no3_cat) | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
sigma ~ log_area + mo(no3_cat) + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
log(n2o_eq) ~ surftemp + log_elev + surftemp:log_elev + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
sigma ~ surftemp + log_elev + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
Data: df_model (Number of observations: 984)
Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 10000
Priors:
b_logn2o ~ normal(0, 1)
b_logn2o_sigma ~ normal(0, 1)
b_logn2oeq ~ normal(0, 1)
b_logn2oeq_sigma ~ normal(0, 1)
Intercept_logn2o ~ normal(2, 1)
Intercept_logn2o_sigma ~ normal(-1, 2)
Intercept_logn2oeq ~ normal(2, 1)
Intercept_logn2oeq_sigma ~ normal(-1, 2)
L ~ lkj_corr_cholesky(2)
Lrescor ~ lkj_corr_cholesky(2)
<lower=0> sd_logn2o ~ exponential(2)
<lower=0> sd_logn2o_sigma ~ exponential(2)
<lower=0> sd_logn2oeq ~ exponential(2)
<lower=0> sd_logn2oeq_sigma ~ exponential(2)
simo_logn2o_mono3_cat:log_area1 ~ dirichlet(1)
simo_logn2o_mono3_cat:surftemp1 ~ dirichlet(1)
simo_logn2o_mono3_cat1 ~ dirichlet(1)
simo_logn2o_sigma_mono3_cat1 ~ dirichlet(1)
Group-Level Effects:
~WSA9 (Number of levels: 9)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.048 0.019 0.020 0.094 1.000
sd(logn2o_mono3_cat) 0.081 0.050 0.007 0.199 1.003
sd(logn2oeq_Intercept) 0.034 0.011 0.018 0.062 1.000
sd(sigma_logn2o_Intercept) 0.111 0.076 0.006 0.293 1.000
sd(sigma_logn2oeq_Intercept) 0.209 0.102 0.036 0.445 1.001
cor(logn2o_Intercept,logn2o_mono3_cat) -0.056 0.360 -0.713 0.644 1.000
cor(logn2o_Intercept,logn2oeq_Intercept) 0.464 0.284 -0.173 0.885 1.000
cor(logn2o_mono3_cat,logn2oeq_Intercept) 0.259 0.341 -0.467 0.824 1.000
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 4413 4624
sd(logn2o_mono3_cat) 1412 2265
sd(logn2oeq_Intercept) 4061 6633
sd(sigma_logn2o_Intercept) 2790 4064
sd(sigma_logn2oeq_Intercept) 2176 1898
cor(logn2o_Intercept,logn2o_mono3_cat) 5823 6525
cor(logn2o_Intercept,logn2oeq_Intercept) 4935 6214
cor(logn2o_mono3_cat,logn2oeq_Intercept) 4684 4770
~WSA9:state (Number of levels: 96)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.046 0.014 0.016 0.072 1.001
sd(logn2o_mono3_cat) 0.097 0.024 0.053 0.146 1.001
sd(logn2oeq_Intercept) 0.033 0.003 0.027 0.040 1.001
sd(sigma_logn2o_Intercept) 0.207 0.088 0.024 0.369 1.007
sd(sigma_logn2oeq_Intercept) 0.285 0.056 0.177 0.395 1.002
cor(logn2o_Intercept,logn2o_mono3_cat) -0.265 0.284 -0.762 0.336 1.001
cor(logn2o_Intercept,logn2oeq_Intercept) 0.163 0.200 -0.232 0.559 1.005
cor(logn2o_mono3_cat,logn2oeq_Intercept) 0.266 0.209 -0.155 0.649 1.002
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 1323 1765
sd(logn2o_mono3_cat) 2612 2569
sd(logn2oeq_Intercept) 3321 5286
sd(sigma_logn2o_Intercept) 776 1082
sd(sigma_logn2oeq_Intercept) 2464 3702
cor(logn2o_Intercept,logn2o_mono3_cat) 1708 3060
cor(logn2o_Intercept,logn2oeq_Intercept) 960 1715
cor(logn2o_mono3_cat,logn2oeq_Intercept) 883 1843
~WSA9:state:size_cat (Number of levels: 352)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(logn2o_Intercept) 0.035 0.015 0.004 0.062 1.004
sd(logn2oeq_Intercept) 0.003 0.002 0.000 0.007 1.003
sd(sigma_logn2o_Intercept) 0.479 0.056 0.374 0.591 1.006
sd(sigma_logn2oeq_Intercept) 0.260 0.052 0.160 0.360 1.001
cor(logn2o_Intercept,logn2oeq_Intercept) 0.235 0.413 -0.658 0.872 1.002
Bulk_ESS Tail_ESS
sd(logn2o_Intercept) 819 1375
sd(logn2oeq_Intercept) 1569 3553
sd(sigma_logn2o_Intercept) 1318 3346
sd(sigma_logn2oeq_Intercept) 2326 3796
cor(logn2o_Intercept,logn2oeq_Intercept) 2950 5425
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
logn2o_Intercept 2.380 0.055 2.273 2.487 1.001 4020 6632
sigma_logn2o_Intercept -1.596 0.097 -1.789 -1.408 1.002 4961 6145
logn2oeq_Intercept 3.116 0.051 3.019 3.218 1.000 6860 7546
sigma_logn2oeq_Intercept -1.922 0.374 -2.639 -1.179 1.000 7865 7774
logn2o_log_area 0.029 0.003 0.023 0.034 1.000 9054 8090
logn2o_surftemp -0.025 0.002 -0.029 -0.021 1.000 3996 7106
sigma_logn2o_log_area -0.095 0.020 -0.135 -0.055 1.001 6068 7491
logn2oeq_surftemp -0.042 0.002 -0.046 -0.039 1.000 7418 8290
logn2oeq_log_elev -0.080 0.008 -0.097 -0.065 1.000 6834 7865
logn2oeq_surftemp:log_elev 0.003 0.000 0.002 0.003 1.000 7339 7992
sigma_logn2oeq_surftemp -0.065 0.010 -0.085 -0.046 1.000 9671 8132
sigma_logn2oeq_log_elev -0.018 0.038 -0.096 0.054 1.000 6265 7038
logn2o_mono3_cat 0.026 0.127 -0.226 0.275 1.004 1971 3252
logn2o_mono3_cat:log_area -0.036 0.010 -0.054 -0.016 1.001 2602 4549
logn2o_mono3_cat:surftemp 0.014 0.006 0.003 0.026 1.004 1478 2416
sigma_logn2o_mono3_cat 0.246 0.036 0.179 0.321 1.001 4246 6738
Simplex Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
logn2o_mono3_cat1[1] 0.026 0.024 0.001 0.089 1.001 4261
logn2o_mono3_cat1[2] 0.091 0.066 0.005 0.252 1.001 1875
logn2o_mono3_cat1[3] 0.250 0.146 0.034 0.615 1.002 2027
logn2o_mono3_cat1[4] 0.633 0.166 0.206 0.867 1.003 1411
logn2o_mono3_cat:log_area1[1] 0.060 0.041 0.005 0.161 1.000 4395
logn2o_mono3_cat:log_area1[2] 0.042 0.039 0.001 0.142 1.000 7129
logn2o_mono3_cat:log_area1[3] 0.297 0.161 0.047 0.681 1.000 5220
logn2o_mono3_cat:log_area1[4] 0.602 0.176 0.166 0.865 1.000 4431
logn2o_mono3_cat:surftemp1[1] 0.043 0.040 0.006 0.144 1.002 3541
logn2o_mono3_cat:surftemp1[2] 0.080 0.056 0.015 0.222 1.001 3667
logn2o_mono3_cat:surftemp1[3] 0.276 0.111 0.105 0.573 1.001 3634
logn2o_mono3_cat:surftemp1[4] 0.601 0.142 0.168 0.785 1.002 2731
sigma_logn2o_mono3_cat1[1] 0.131 0.074 0.010 0.288 1.000 6840
sigma_logn2o_mono3_cat1[2] 0.151 0.096 0.010 0.367 1.000 7630
sigma_logn2o_mono3_cat1[3] 0.444 0.149 0.149 0.734 1.001 6331
sigma_logn2o_mono3_cat1[4] 0.275 0.138 0.026 0.544 1.001 4902
Tail_ESS
logn2o_mono3_cat1[1] 5305
logn2o_mono3_cat1[2] 3916
logn2o_mono3_cat1[3] 2856
logn2o_mono3_cat1[4] 2439
logn2o_mono3_cat:log_area1[1] 3509
logn2o_mono3_cat:log_area1[2] 6046
logn2o_mono3_cat:log_area1[3] 5683
logn2o_mono3_cat:log_area1[4] 4505
logn2o_mono3_cat:surftemp1[1] 3002
logn2o_mono3_cat:surftemp1[2] 2580
logn2o_mono3_cat:surftemp1[3] 3352
logn2o_mono3_cat:surftemp1[4] 2289
sigma_logn2o_mono3_cat1[1] 4269
sigma_logn2o_mono3_cat1[2] 5795
sigma_logn2o_mono3_cat1[3] 6964
sigma_logn2o_mono3_cat1[4] 5797
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(logn2o,logn2oeq) 0.141 0.038 0.066 0.216 1.000 11829 8204
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Model checks
Again, the same PPCs as above were performed for this model. #### N2O
PPC This PPC for N2O looked similar to the previous model.

Equilibrium
N2O
Again, the PPCs for this model were similar to the previous model,
which was unsurprising given that it was the same model for N2O-eq.

Bivariate
This PPC was also similar to the previous model.

Saturation
This check was also similar to the prevoius model, with perhaps
slightly less bias in the proportion unsaturated estimates. There is
also a potentially concerning extreme prediction in the observed
vs predicted PPC.

R-square
Estimate Est.Error Q2.5 Q97.5
R2logn2o 0.629 0.03 0.563 0.68
Estimate Est.Error Q2.5 Q97.5
R2logn2oeq 0.874 0.005 0.864 0.882
Covariate
effects
N2O
The conditional effects plot for the covariate effects N2O suggested
a similar effect of NO3, but interesting interactions between NO3 and
lake area and NO3 and surface temperature. For lake area, the effect was
estimated to be larger and more negative at the highest levels of NO3;
and slightly negative at the lowest level of NO3. For surface
temperature, the effect was estimated to be largest and positive at the
highest level of NO3; and negative at the lowest level of NO3.

The estimated covariate effects on \(\sigma\) suggested a negative relationship
with log(area) and a positive relationship, again, with NO3.

Equilibrium
N2O
The estimated covariate effect on N2O remained largely the same as
estimated in the previous model.


A Final Model
As demonstrated above, models excluding the NO3 covariate
consistently resulted in poorer fits to to the observed dissolved N2O
data. Including surface temperature and elevation in the equilibrium N2O
part of the model resulted in substantially improved replication of key
aspects of the observed data. Likewise, added flexibility in the
distributional terms for both dissolved and equilibrium N2O led to
improvements.
To make inferences from this model for N2O in the population of
interest, however, the included covariates needed to be (1) fully
observed across that population or (2) their missingness needed to be
modeled. For the lake area and elevation covariates, data was
available for all lakes from previously compiled geospatial databases.
However, neither surface temperature or NO3 were observed for lakes
outside of the sample. They were partially observed with respect to the
target population. Their missingness needed to be accounted for in a
model. Therefore, a more complex model was constructed below that
included surface temperature and NO3 as additional responses conditioned
on the survey design variables and fully observed covariates. This
approach to inference for N2O was similar to a Bayesian structural
equation model (Merkle et al. 2021; Merkle and Rosseel
2018). The main details of the logical dependence structure
could be characterized as:
\[\begin{align}
\color{#1F449C}{\boldsymbol{N_2O_{diss}}} &=\sim Survey + Area +
\color{#F05039}{\boldsymbol{NO_3}} + \color{#EEBAB4}{\boldsymbol{Temp}}
\\
\color{#A8B6CC}{\boldsymbol{N_2O_{equil}}} &=\sim Survey + Elev +
\color{#EEBAB4}{\boldsymbol{Temp}}\\
\color{#F05039}{\boldsymbol{NO_3}} &=\sim Survey + Area +
\color{#EEBAB4}{\boldsymbol{Temp}} \\
\color{#EEBAB4}{\boldsymbol{Temp}} &=\sim Survey + Lat + Elev + Day
\end{align}\]
Variables in color text above were treated as partially observed with
respect to the population of interest (i.e., observed only in the
sample), whereas variables in black text were considered fully observed.
The partially observed variables, being dissolved and equilibrium N2O,
NO3, and surface temperature, were each modeled conditional on the
survey design variables and other partially and/or fully observed
covariates. This structural equation approach requires a more complex
set of post-processing steps compared to a typical MRP analysis. In
order to propagate estimates and uncertainty through the dependency
structure and make inferences, the fitted model was used to first
predict surface temperature in the target population, since it depended
only on the fully observed covariates. That predictive distribution was
then used alongside the relevant fully observed covariates to predict
NO3 in the target population. Finally, the predictive distributions for
termperature and NO3 were used to predict the N2O responses. These steps
were carried out in the “Predict to population” section to follow.
In the final model below, the submodel for surface temperature
assumed a Gamma distributed error distribution and the linear predictor
included the survey design variables, latitude, elevation, and julian
date. The shape parameter was also modeled as a function of latitude to
address increasing response variance along the latitudinal gradient. The
NO3 submodel was a cumulative logit formulation and the linear predictor
included all of the survey factors as well as surface temperature and
lake area.
The N2O and N2O-eq responses were each modeled with Gamma distributed
errors, but with the same covariate structure as in model 5. The same
structure was also employed for the shape terms in these responses,
corresponding to the \(\sigma\) terms
in the previous model. Though not shown in this document, the Gamma
error structure appeared to result in slightly better performance in the
predictive checks compared to the Gaussian errors in previous models.
This was primarily apparent in the saturation ratio checks, which may
have been more sensitive to model performance in the tails of the N2O
responses. Others have also indicated that the Gamma error distribution
can work well for dissolved N2O data (Webb et al.
2019).
Note that there was no residual correlation term for this model,
since the residuals are undefined for the Gamma and cumulative logit
models. Dropping the observation-level residual correlation term was
deemed a reasonable compromise that enabled modeling the missingness of
NO3, in particular. Nevertheless, the random intercepts again allowed
for potential correlations between responses at the group levels.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
bf_n2o <- bf(n2o ~ mo(no3_cat) +
log_area +
surftemp +
mo(no3_cat):log_area +
mo(no3_cat):surftemp +
(mo(no3_cat) | a | WSA9) +
(mo(no3_cat) | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
shape ~ log_area +
mo(no3_cat) +
(1 | WSA9) +
(1 | WSA9:state) +
(1 | WSA9:state:size_cat),
family = Gamma(link = "log"))
bf_n2oeq <- bf(n2o_eq ~ surftemp +
log_elev +
surftemp:log_elev +
(1 | a | WSA9) +
(1 | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
shape ~ surftemp +
log_elev +
(1 | WSA9) +
(1 | WSA9:state) +
(1 | WSA9:state:size_cat),
family = Gamma(link = "log"))
bf_temp <- bf(surftemp ~ lat +
s(log_elev) +
s(jdate) +
(1 | a | WSA9) +
(1 | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
shape ~ lat,
family = Gamma(link = "log"))
bf_no3 <- bf(no3_cat ~ surftemp +
log_area +
(1 | a | WSA9) +
(1 | b | WSA9:state) +
(1 | c | WSA9:state:size_cat),
family = cumulative(link = "logit", threshold="flexible"))
priors <- c(
prior(normal(2, 1), class = "Intercept", resp = "n2o"),
prior(normal(0, 1), class = "b", resp = "n2o"),
prior(exponential(2), class = "sd", resp = "n2o"),
prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "n2o"),
prior(normal(0, 1), class = "b", dpar = "shape", resp = "n2o"),
prior(exponential(2), class = "sd", dpar = "shape", resp = "n2o"),
prior(normal(2, 1), class = "Intercept", resp = "n2oeq"),
prior(normal(0, 1), class = "b", resp = "n2oeq"),
prior(exponential(2), class = "sd", resp = "n2oeq"),
prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "n2oeq"),
prior(normal(0, 1), class = "b", dpar = "shape", resp = "n2oeq"),
prior(exponential(2), class = "sd", dpar = "shape", resp = "n2oeq"),
prior(normal(3, 1), class = "Intercept", resp = "surftemp"),
prior(normal(0, 1), class = "b", resp = "surftemp"),
prior(exponential(0.5), class = "sds", resp = "surftemp"),
prior(exponential(2), class = "sd", resp = "surftemp"),
prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "surftemp"),
prior(normal(0, 1), class = "b", dpar = "shape", resp = "surftemp"),
prior(normal(0, 3), class = "Intercept", resp = "no3cat"),
prior(normal(0, 1), class = "b", resp = "no3cat"),
prior(exponential(1), class = "sd", resp = "no3cat"),
prior(lkj(2), class = "cor")
)
n2o_mod6 <- brm(bf_n2o +
bf_n2oeq +
bf_temp +
bf_no3 +
set_rescor(rescor = FALSE),
data = df_model,
prior = priors,
control = list(adapt_delta = 0.975, max_treedepth = 14),
#sample_prior = "only",
save_pars = save_pars(all = TRUE),
seed = 85132,#14548,
#init = my_inits,
init_r = 0.5,
chains=4,
iter=5000,
cores=4)
save(n2o_mod6, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
Summarize fit
Below is a summary of the fitted parameters and MCMC diagnostics.
Family: MV(gamma, gamma, gamma, cumulative)
Links: mu = log; shape = log
mu = log; shape = log
mu = log; shape = log
mu = logit; disc = identity
Formula: n2o ~ mo(no3_cat) + log_area + surftemp + mo(no3_cat):log_area + mo(no3_cat):surftemp + (mo(no3_cat) | a | WSA9) + (mo(no3_cat) | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
shape ~ log_area + mo(no3_cat) + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
n2o_eq ~ surftemp + log_elev + surftemp:log_elev + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
shape ~ surftemp + log_elev + (1 | WSA9) + (1 | WSA9:state) + (1 | WSA9:state:size_cat)
surftemp ~ lat + s(log_elev) + s(jdate) + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
shape ~ lat
no3_cat ~ surftemp + log_area + (1 | a | WSA9) + (1 | b | WSA9:state) + (1 | c | WSA9:state:size_cat)
Data: df_model (Number of observations: 984)
Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 10000
Priors:
b_n2o ~ normal(0, 1)
b_n2o_shape ~ normal(0, 1)
b_n2oeq ~ normal(0, 1)
b_n2oeq_shape ~ normal(0, 1)
b_no3cat ~ normal(0, 1)
b_surftemp ~ normal(0, 1)
b_surftemp_shape ~ normal(0, 1)
Intercept_n2o ~ normal(2, 1)
Intercept_n2o_shape ~ normal(5, 4)
Intercept_n2oeq ~ normal(2, 1)
Intercept_n2oeq_shape ~ normal(5, 4)
Intercept_no3cat ~ normal(0, 3)
Intercept_surftemp ~ normal(3, 1)
Intercept_surftemp_shape ~ normal(5, 4)
L ~ lkj_corr_cholesky(2)
<lower=0> sd_n2o ~ exponential(2)
<lower=0> sd_n2o_shape ~ exponential(2)
<lower=0> sd_n2oeq ~ exponential(2)
<lower=0> sd_n2oeq_shape ~ exponential(2)
<lower=0> sd_no3cat ~ exponential(1)
<lower=0> sd_surftemp ~ exponential(2)
<lower=0> sds_surftemp ~ exponential(0.5)
simo_n2o_mono3_cat:log_area1 ~ dirichlet(1)
simo_n2o_mono3_cat:surftemp1 ~ dirichlet(1)
simo_n2o_mono3_cat1 ~ dirichlet(1)
simo_n2o_shape_mono3_cat1 ~ dirichlet(1)
Smooth Terms:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(surftemp_slog_elev_1) 1.161 0.370 0.638 2.079 1.001 2264 4174
sds(surftemp_sjdate_1) 0.571 0.277 0.226 1.273 1.000 2679 4853
Group-Level Effects:
~WSA9 (Number of levels: 9)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(n2o_Intercept) 0.048 0.018 0.020 0.091 1.000
sd(n2o_mono3_cat) 0.045 0.035 0.002 0.129 1.003
sd(n2oeq_Intercept) 0.033 0.011 0.018 0.061 1.000
sd(surftemp_Intercept) 0.031 0.014 0.011 0.064 1.001
sd(no3cat_Intercept) 0.690 0.256 0.296 1.308 1.000
sd(shape_n2o_Intercept) 0.211 0.141 0.011 0.535 1.000
sd(shape_n2oeq_Intercept) 0.402 0.183 0.075 0.814 1.001
cor(n2o_Intercept,n2o_mono3_cat) -0.049 0.336 -0.667 0.614 1.000
cor(n2o_Intercept,n2oeq_Intercept) 0.393 0.268 -0.202 0.824 1.000
cor(n2o_mono3_cat,n2oeq_Intercept) 0.102 0.328 -0.559 0.686 1.000
cor(n2o_Intercept,surftemp_Intercept) -0.350 0.296 -0.839 0.285 1.000
cor(n2o_mono3_cat,surftemp_Intercept) 0.089 0.333 -0.564 0.703 1.000
cor(n2oeq_Intercept,surftemp_Intercept) -0.167 0.299 -0.705 0.437 1.000
cor(n2o_Intercept,no3cat_Intercept) -0.057 0.295 -0.605 0.522 1.000
cor(n2o_mono3_cat,no3cat_Intercept) 0.141 0.333 -0.539 0.724 1.002
cor(n2oeq_Intercept,no3cat_Intercept) 0.272 0.274 -0.305 0.740 1.001
cor(surftemp_Intercept,no3cat_Intercept) 0.185 0.300 -0.436 0.718 1.000
Bulk_ESS Tail_ESS
sd(n2o_Intercept) 3011 2931
sd(n2o_mono3_cat) 1540 3502
sd(n2oeq_Intercept) 3147 4572
sd(surftemp_Intercept) 3705 4498
sd(no3cat_Intercept) 4184 5618
sd(shape_n2o_Intercept) 2379 3641
sd(shape_n2oeq_Intercept) 2208 1617
cor(n2o_Intercept,n2o_mono3_cat) 6558 6134
cor(n2o_Intercept,n2oeq_Intercept) 4604 6010
cor(n2o_mono3_cat,n2oeq_Intercept) 3241 5029
cor(n2o_Intercept,surftemp_Intercept) 4744 5916
cor(n2o_mono3_cat,surftemp_Intercept) 5246 6447
cor(n2oeq_Intercept,surftemp_Intercept) 7342 7508
cor(n2o_Intercept,no3cat_Intercept) 5152 6479
cor(n2o_mono3_cat,no3cat_Intercept) 3435 5897
cor(n2oeq_Intercept,no3cat_Intercept) 6449 7400
cor(surftemp_Intercept,no3cat_Intercept) 6490 7928
~WSA9:state (Number of levels: 96)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(n2o_Intercept) 0.047 0.013 0.021 0.072 1.004
sd(n2o_mono3_cat) 0.101 0.019 0.068 0.142 1.001
sd(n2oeq_Intercept) 0.033 0.003 0.026 0.040 1.001
sd(surftemp_Intercept) 0.035 0.006 0.023 0.047 1.000
sd(no3cat_Intercept) 0.878 0.128 0.649 1.147 1.000
sd(shape_n2o_Intercept) 0.383 0.178 0.032 0.707 1.004
sd(shape_n2oeq_Intercept) 0.561 0.114 0.336 0.781 1.002
cor(n2o_Intercept,n2o_mono3_cat) -0.145 0.254 -0.602 0.376 1.002
cor(n2o_Intercept,n2oeq_Intercept) 0.183 0.189 -0.180 0.549 1.003
cor(n2o_mono3_cat,n2oeq_Intercept) 0.201 0.164 -0.122 0.519 1.002
cor(n2o_Intercept,surftemp_Intercept) -0.023 0.237 -0.476 0.438 1.004
cor(n2o_mono3_cat,surftemp_Intercept) -0.232 0.215 -0.644 0.194 1.001
cor(n2oeq_Intercept,surftemp_Intercept) -0.134 0.190 -0.494 0.246 1.000
cor(n2o_Intercept,no3cat_Intercept) 0.462 0.211 0.018 0.822 1.004
cor(n2o_mono3_cat,no3cat_Intercept) 0.145 0.200 -0.251 0.531 1.000
cor(n2oeq_Intercept,no3cat_Intercept) 0.054 0.140 -0.220 0.329 1.000
cor(surftemp_Intercept,no3cat_Intercept) -0.231 0.189 -0.586 0.154 1.001
Bulk_ESS Tail_ESS
sd(n2o_Intercept) 1025 1337
sd(n2o_mono3_cat) 3366 4975
sd(n2oeq_Intercept) 2809 4611
sd(surftemp_Intercept) 4124 5116
sd(no3cat_Intercept) 4389 6247
sd(shape_n2o_Intercept) 649 1304
sd(shape_n2oeq_Intercept) 1908 1810
cor(n2o_Intercept,n2o_mono3_cat) 1069 1955
cor(n2o_Intercept,n2oeq_Intercept) 794 1260
cor(n2o_mono3_cat,n2oeq_Intercept) 833 1873
cor(n2o_Intercept,surftemp_Intercept) 1849 3618
cor(n2o_mono3_cat,surftemp_Intercept) 2626 4267
cor(n2oeq_Intercept,surftemp_Intercept) 6340 6860
cor(n2o_Intercept,no3cat_Intercept) 893 1944
cor(n2o_mono3_cat,no3cat_Intercept) 1746 3085
cor(n2oeq_Intercept,no3cat_Intercept) 6378 7983
cor(surftemp_Intercept,no3cat_Intercept) 3573 5949
~WSA9:state:size_cat (Number of levels: 352)
Estimate Est.Error l-95% CI u-95% CI Rhat
sd(n2o_Intercept) 0.038 0.014 0.006 0.064 1.011
sd(n2oeq_Intercept) 0.004 0.002 0.000 0.008 1.009
sd(surftemp_Intercept) 0.010 0.007 0.000 0.024 1.003
sd(no3cat_Intercept) 0.308 0.182 0.016 0.673 1.002
sd(shape_n2o_Intercept) 0.895 0.110 0.684 1.114 1.002
sd(shape_n2oeq_Intercept) 0.511 0.104 0.309 0.715 1.002
cor(n2o_Intercept,n2oeq_Intercept) 0.310 0.349 -0.486 0.841 1.004
cor(n2o_Intercept,surftemp_Intercept) 0.015 0.365 -0.677 0.705 1.000
cor(n2oeq_Intercept,surftemp_Intercept) -0.102 0.381 -0.761 0.659 1.001
cor(n2o_Intercept,no3cat_Intercept) -0.090 0.342 -0.716 0.601 1.001
cor(n2oeq_Intercept,no3cat_Intercept) -0.143 0.359 -0.757 0.611 1.001
cor(surftemp_Intercept,no3cat_Intercept) 0.173 0.378 -0.599 0.807 1.001
Bulk_ESS Tail_ESS
sd(n2o_Intercept) 580 1036
sd(n2oeq_Intercept) 1062 3160
sd(surftemp_Intercept) 1917 3889
sd(no3cat_Intercept) 1078 2386
sd(shape_n2o_Intercept) 1068 2786
sd(shape_n2oeq_Intercept) 1397 2071
cor(n2o_Intercept,n2oeq_Intercept) 2099 4705
cor(n2o_Intercept,surftemp_Intercept) 4876 6414
cor(n2oeq_Intercept,surftemp_Intercept) 4195 5741
cor(n2o_Intercept,no3cat_Intercept) 3188 5334
cor(n2oeq_Intercept,no3cat_Intercept) 2749 4946
cor(surftemp_Intercept,no3cat_Intercept) 2607 5750
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
n2o_Intercept 2.392 0.056 2.285 2.500 1.000 3063 5517
shape_n2o_Intercept 3.215 0.189 2.849 3.589 1.001 3997 6108
n2oeq_Intercept 3.111 0.053 3.011 3.218 1.000 5146 6075
shape_n2oeq_Intercept 3.873 0.751 2.385 5.326 1.001 5619 6535
surftemp_Intercept 3.791 0.060 3.672 3.906 1.000 7414 7632
shape_surftemp_Intercept 8.637 0.460 7.721 9.522 1.001 10817 7718
no3cat_Intercept[1] -3.025 0.615 -4.258 -1.864 1.000 5932 6622
no3cat_Intercept[2] -2.059 0.605 -3.274 -0.903 1.000 6278 6695
no3cat_Intercept[3] -1.027 0.600 -2.235 0.115 1.000 6689 6957
no3cat_Intercept[4] -0.028 0.602 -1.247 1.113 1.000 7002 6940
n2o_log_area 0.028 0.003 0.022 0.034 1.000 6958 7668
n2o_surftemp -0.025 0.002 -0.029 -0.021 1.000 3145 6260
shape_n2o_log_area 0.190 0.041 0.110 0.271 1.000 4322 5605
n2oeq_surftemp -0.042 0.002 -0.045 -0.038 1.000 5836 6494
n2oeq_log_elev -0.080 0.008 -0.097 -0.064 1.000 5425 6234
n2oeq_surftemp:log_elev 0.002 0.000 0.002 0.003 1.000 5857 6696
shape_n2oeq_surftemp 0.131 0.020 0.092 0.171 1.001 6725 7112
shape_n2oeq_log_elev 0.030 0.077 -0.117 0.186 1.001 4914 6636
surftemp_lat -0.016 0.001 -0.019 -0.013 1.000 7349 7598
shape_surftemp_lat -0.105 0.011 -0.127 -0.083 1.000 11330 7919
no3cat_surftemp -0.141 0.023 -0.187 -0.096 1.001 6713 7328
no3cat_log_area 0.068 0.035 -0.001 0.137 1.000 11692 8177
surftemp_slog_elev_1 -3.477 0.479 -4.414 -2.557 1.000 5891 6330
surftemp_sjdate_1 -0.008 0.528 -1.074 1.017 1.000 4010 5523
n2o_mono3_cat 0.007 0.087 -0.172 0.175 1.004 1650 1855
n2o_mono3_cat:log_area -0.046 0.009 -0.063 -0.027 1.001 2079 2821
n2o_mono3_cat:surftemp 0.018 0.004 0.009 0.026 1.004 1417 1621
shape_n2o_mono3_cat -0.510 0.070 -0.649 -0.375 1.001 4691 6283
Simplex Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
n2o_mono3_cat1[1] 0.025 0.023 0.001 0.083 1.000 4899 5237
n2o_mono3_cat1[2] 0.183 0.111 0.014 0.425 1.002 1056 3343
n2o_mono3_cat1[3] 0.408 0.158 0.103 0.725 1.002 2167 4583
n2o_mono3_cat1[4] 0.384 0.169 0.058 0.713 1.003 1608 3021
n2o_mono3_cat:log_area1[1] 0.046 0.029 0.003 0.112 1.000 4018 3309
n2o_mono3_cat:log_area1[2] 0.033 0.028 0.001 0.104 1.001 6173 4948
n2o_mono3_cat:log_area1[3] 0.289 0.134 0.057 0.598 1.000 3286 4461
n2o_mono3_cat:log_area1[4] 0.631 0.140 0.301 0.864 1.000 3186 4184
n2o_mono3_cat:surftemp1[1] 0.028 0.019 0.003 0.060 1.001 3744 3410
n2o_mono3_cat:surftemp1[2] 0.066 0.033 0.011 0.128 1.001 3821 3213
n2o_mono3_cat:surftemp1[3] 0.281 0.079 0.122 0.431 1.003 2897 2971
n2o_mono3_cat:surftemp1[4] 0.625 0.088 0.461 0.798 1.003 2740 3106
shape_n2o_mono3_cat1[1] 0.116 0.067 0.009 0.263 1.001 5363 4297
shape_n2o_mono3_cat1[2] 0.149 0.096 0.008 0.365 1.001 3469 3637
shape_n2o_mono3_cat1[3] 0.401 0.151 0.116 0.703 1.001 4122 5848
shape_n2o_mono3_cat1[4] 0.334 0.143 0.051 0.604 1.001 3853 4561
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc_no3cat 1.000 0.000 1.000 1.000 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Model checks
Below, the same PPCs for N2O and N2O-eq were employed as before. ####
N2O PPC The PPCs for N2O from this model were similarly reasonable as
for models 4 and 5 above.
Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'scatter_avg' by default.

Equilibrium N2O
PPC
Again, the PPCs for N2O-eq in this model were similar to those for
models 4 and 5.
Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'stat_2d' by default.
Using all posterior draws for ppc type 'scatter_avg' by default.

Bivariate
PPC
This model again provided a very reasonable representation of the
bivariate relationship between N2O and N2O-eq (below).

Saturation
PPC
The saturation ratio PPCs below show similar behavior as with models
4 and 5 above, but with perhaps slightly less bias in the predictions
for the proportion of undersaturated waterbodies and fewer extreme
predictions for the means and standard deviations. The observed
vs. predicted PPC also appears to have a better behaved
variance and no extreme predictions, compared to models 4 and 5 with the
lognormal errors.

The plot below shows the same PPC, but for the “test” or second-vist
data. Overall, the model looked to perform similarly as with the data
used to fit it.

R-square
Below are estimates for the Bayesian \(R^2\), which were largely similar for N2O
and N2O-eq as with models 4 and 5 above. The \(R^2\) for the surface temperature response
also suggested a fairly good fit.
Estimate Est.Error Q2.5 Q97.5
R2n2o 0.646 0.059 0.503 0.731
Estimate Est.Error Q2.5 Q97.5
R2n2oeq 0.863 0.006 0.851 0.874
Estimate Est.Error Q2.5 Q97.5
R2surftemp 0.744 0.01 0.723 0.763
Below are the same \(R^2\)
estimates, but for the second-visit data. That these estimates are
similar to those for the data used to fit the model, suggesting that the
model may perform similarly well out-of-sample.
Estimate Est.Error Q2.5 Q97.5
R2n2o 0.607 0.137 0.325 0.85
Estimate Est.Error Q2.5 Q97.5
R2n2oeq 0.857 0.008 0.84 0.872
Estimate Est.Error Q2.5 Q97.5
R2surftemp 0.75 0.018 0.715 0.783
Covariate
effects
N2O
The conditional effects plot for the covariate effects N2O suggested
a similar effect of NO3, but interesting interactions between NO3 and
lake area and NO3 and surface temperature. For lake area, the effect was
estimated to be larger and more negative at the highest levels of NO3;
and slightly negative at the lowest level of NO3. For surface
temperature, the effect was estimated to be largest and positive at the
highest level of NO3; and negative at the lowest level of NO3.

The estimated covariate effects on \(\sigma\) suggested a negative relationship
with log(area) and a positive relationship, again, with NO3.

Equilibrium
N2O
The estimated covariate effect on N2O remained largely the same as
estimated in the previous model.


Predict to
population
As previously described, in order to make inferences to the
population of interest, the final model above was used to, first,
predict surface temperature in the target population, since it depended
only on the fully observed covariates. Next, the predictive distribution
for surface temperature was used, along with the relevant fully observed
covariates, to predict NO3 in the target population. Finally, the
predictive distributions for temperature and NO3 were used to predict
the N2O responses. The code for these steps is outlined in the
following.
The first step used the final model to predict to the population:
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
predict_temp <- sframe %>%
mutate(jdate = 205) %>%
add_predicted_draws(n2o_mod6, resp=c("surftemp"),
allow_new_levels = TRUE,
cores =1,
ndraws = 500) %>%
mutate(surftemp = .prediction)
save(predict_temp, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")
NO3 was next predicted. Note that the posterior predictive
distribution for NO3 was subsampled in order to minimize excess
simulations
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")
temp_X <- predict_temp %>% # select relevant columns as predictors
ungroup() %>%
select(WSA9,
state,
size_cat,
log_area,
.row,
.draw,
surftemp) %>%
select(WSA9, state, size_cat, log_area, surftemp)
rm(predict_temp) # reduce memory
gc()
# set number of cores to use for parallel predictions
# and register the workers
cl <- parallel::makeCluster(5)
doSNOW::registerDoSNOW(cl)
# make a progress bar
pb <- txtProgressBar(max = 1500, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)
system.time( # approx 26 hrs with 5 workers & 500 draws from PPD
predict_no3 <- foreach(sub_X = isplitRows(temp_X, chunkSize = 155299),
.combine = 'c',
.packages = c("brms"),
.options.snow = opts
) %dopar% {
apply(brms::posterior_predict(n2o_mod6,
newdata = sub_X,
resp = "no3cat",
allow_new_levels = T,
ndraws = 500,
cores = 1), 2, sample, 1)
}
)
close(pb)
parallel::stopCluster(cl)
save(predict_no3, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
Finally, N2O and N2O-eq were predicted using the surface temperature
and nitrate predictions along with the survey variables and known
covariates. Again, the posterior was subsampled in order to reduce
excess simulations.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")
# Assemble dataframe containing relevant covariates (known and predicted)
n2o_X <- predict_temp %>%
ungroup() %>%
mutate(no3_cat = predict_no3) %>%
select(WSA9,
state,
size_cat,
log_area,
surftemp,
log_elev,
no3_cat)
# clear objects to reduce memory overhead
rm(predict_no3, predict_temp)
gc()
# save the predictors for n2o and n2oeq
save(n2o_X, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_X.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_X.rda")
# set number of cores to use for parallel predictions
# and register the workers
cl <- parallel::makeCluster(6)
doSNOW::registerDoSNOW(cl)
# make a progress bar
pb <- txtProgressBar(max = 1500, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)
# make predictions in parallel
system.time(
predict_n2o <- foreach(sub_X = isplitRows(n2o_X, chunkSize = 155299),
.combine = rbind,
.options.snow = opts,
.packages = c("brms")) %dopar% {
apply(posterior_predict(n2o_mod6,
newdata = sub_X,
resp = c("n2o", "n2oeq"),
allow_new_levels = T,
ndraws = 500,
cores = 1),
2, sample, 1)
}
)
close(pb)
parallel::stopCluster(cl)
colnames(predict_n2o) <- c("n2o", "n2oeq")
save(predict_n2o, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_n2o.rda")
Finally, the predictions for all four partially observed responses
were assembled into a new dataframe for use in inference.
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_n2o.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")
all_predictions <- predict_temp %>%
ungroup() %>%
mutate(no3cat = predict_no3) %>%
bind_cols(predict_n2o) %>%
mutate(n2osat = n2o / n2oeq, # calculate saturation ratio
.row = rep(1:465897, each = 500),
.draw = rep(seq(1,500, 1), 465897)) %>%
mutate(area_ha = exp(log_area)) %>% # include area on ha scale
select(WSA9,
state,
size_cat,
area_ha,
lat,
lon,
.row,
.draw,
surftemp,
no3cat,
n2o,
n2oeq,
n2osat)
rm(predict_n2o, predict_temp, predict_no3) # clean up workspace for RAM
gc()
save(all_predictions, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
---
title: "Bayesian models for NLA 2017 $N_2O$ survey data"
author: "Roy Martin, Jake Beaulieu, Michael McManus"
date: "`r Sys.Date()`"
output:
  pdf_document:
    toc: yes
    toc_depth: '4'
  html_notebook:
    toc: yes
    toc_depth: 4
    toc_float: yes
    code_folding: show
    font_size: 14
    number_sections: yes
    theme: simplex
  github_document:
    toc: yes
    toc_depth: 4
    toc_float: yes
    code_folding: show
    font_size: 14
    number_sections: yes
    theme: simplex
bibliography: RWM_Endnote_Library.bib
link-citations: yes
editor_options:
  chunk_output_type: inline
---

```{r begin, eval=TRUE, include=FALSE}
library(ggpubr)
library(moments)
library(ggplot2)
library(ggExtra)
library(gridExtra)
library(kableExtra)
library(ggrepel)
library(dplyr)
library(tidyverse)
library(tidyr)
library(future)
library(foreach)
library(itertools)
library(bayesplot)
library(tidybayes)
library(brms)

options(mc.cores = parallel::detectCores(logical = FALSE))
options( max.print = 1000 )

# Identify local path for each user
localPath <- Sys.getenv("USERPROFILE")

# Define helper functions
# standardized formatting for column names
toEPA <- function(X1){
  names(X1) = tolower(names(X1))
  names(X1) = gsub(pattern = c("\\(| |#|)|/|-|\\+|:|_"), replacement = ".", x = names(X1))
  X1
}

# stat: skew 
skew <- function(x) {
  xdev <- x - mean(x)
  n <- length(x)
  r <- sum(xdev^3) / sum(xdev^2)^1.5
  return(r * sqrt(n) * (1 - 1/n)^1.5)
}

# function for DHARMa residual analysis
check_brms <- function(model,             # brms model
                       integer = FALSE,   # integer response? (TRUE/FALSE)
                       plot = TRUE,       # make plot?
                       resp = NULL,
                       ...                # further arguments for DHARMa::plotResiduals 
) {
  
  mdata <- brms::standata(model)
  if(!"Y" %in% names(mdata))
    oResp <- mdata[[paste0("Y_", resp)]]
  else
    oResp <- mdata[["Y"]]
  #  stop("Cannot extract the required information from this brms model")
  
  dharma.obj <- DHARMa::createDHARMa(
    simulatedResponse = t(brms::posterior_predict(model, resp = resp, ndraws = 1000)),
    observedResponse = oResp, 
    fittedPredictedResponse = apply(
      t(brms::posterior_epred(model, resp = resp, ndraws = 1000, re.form = NA)),
      1,
      median),
    integerResponse = integer)
  
  if (isTRUE(plot)) {
    plot(dharma.obj, ...)
  }
  
  invisible(dharma.obj)
  
}
```

# Rationale and Objectives
This document details the modeling workflow implemented for estimating dissolved and equilibrium N2O concentrations and saturation ratios using the 2017 Nation Lakes Assessment (NLA) survey data. The NLA sampling sites were distributed among the target population of US lakes (in the lower 48 states) according to a probabilistic survey design with samples stratified among categories of lake surface area, WSA9 ecoregion, and US state (excluding AK and HI). Due to the stratification scheme, some types of lakes in the sample population were intentionally over-represented (e.g., large lakes) and some were under-represented (e.g., small lakes) relative to the target population. Due to the unequal probability design, inferences from the sample had to be adjusted for inferences on the broader populations of interest (e.g, National-, state-, ecoregion-, and size class-specific estimates). 

The concept of the "complete data likelihood" is useful for conceptualizing biases arising from sampling design [@Zachmann_etal_2022; @Gelman_etal_2014 Ch. 8; @Link_Barker_2010]. For the NLA survey data, the population of US lakes in the lower 48 states larger than 4 hectares was considered the complete data and the probabilistic samples were considered a subset of that complete data. The portion of US lakes not included in the sample were considered "missing" from the complete data _not_ at random, but conditional on the pre-specified design (stratification) variables. This non-random missingness was not ignorable for the purpose of making inferences from the sample to the target population. In a model-based framework, however, including the design parameters as predictors in a regression model is one way to adjust for the missingness. For a thorough and recent treatment of this concept in the context of national surveys of environmental resources, refer to [@Zachmann_etal_2022]. This concept is a key motivator for the increasingly popular mulitilevel regression with poststratification (MRP) approach to model-based inference [@Gelman_etal_2014; Gelmant_etal_2020 Ch. 17].

The following workflow illustrates our model-based approach, based largely on the logic of MRP, but with an elaboration on the poststratification step to enable eventual estimates of total gas flux at the population level, which required scaling up from lake-level estimates. The typical MRP process is carried out in two steps. The first step is to fit regression models for the response variables of interest (e.g., dissolved N2O, equilibrium N2O) conditional on the survey design variables ┼(i.e., ecoregion, state, lake size). The second step is post-stratfication, wherein the posterior parameter estimates from the regression model for the sample population are weighted based on their known or assumed distribution in the population of interest [i.e., post-stratification table; @Gelman_etal_2020 Ch. 17]. The poststratification table in our case, for example, would be a population summary of lakes among the design variables: ecoregion, state, and size category. However, because we eventually needed lake-level estmates, instead of predicting to a postratification table, we predicted to each individual lake in the population of interest. This meant predicting to the full target population of 465,897 natural and man made US lakes larger than 4 hectares in the lower 48 states. These predictions were assumed relevant to average conditions during the "index period" for each lake in 2017. Details about the sampling frame as well as the target population are further clarified in the workbook below with data summaries and code. 

For the regressions, we used multilevel models fit in a fully Bayesian fashion Multilevel models are thought to work well in this context because they provide regularized estimates along the design groupings, which can improve out-of-sample inferences [@McElreath_2020]. Inferences for lake types that may be missing from the sample, but are part of the population of interest are also straightforward using this approach [@Gelman_etal_2020 Ch. 17; @McElreath_2020]. More information these models, their specific parameters, R code, fit evaluations, and resulting inferences are presented in this document.

The overriding objective of the modeling effort was to provide population level estimates for (1) dissolved and equilibrium N2O concentrations; (2) the N2O saturation ratio (i.e., dissolved N2O/equilibrium N2O); and (3) the proportion of under-saturated water bodies (i.e., saturation ratio < 1). The estimates would also be used to later estimate the total flux of N2O gas attributable to the target population of lakes over the index period. The saturation ratio estimates were calculated as a derived quantity based on the ratio of modeled dissolved to equilibrium N2O. Because dissolved and equilibrium N2O were observed on the same sample units (lake sites), we developed models for estimating their joint distribution. The response variable in the models was, therefore, multivariate to account for potential statistical dependencies between dissolved and equilibrium N2O due to, for example, common dependencies on geography. Although point predictions of the mean marginal probabilities from separate models could be comparable, a joint model allowing correlated observation-level errors (i.e., residuals) was expected to better capture uncertainty and potentially improve out-of-sample predictions, should the variables be conditionally correlated [@Warton_etal_2015; @Poggiato_etal_2021]. All of the models fit were constructed using the `brms` package [@Burkner_2017] in `R` [@R_Core_Team_2021] as an interface to Stan, a software package for fitting fully Bayesian models via Hamiltonian Monte Carlo [HMC; @Stan_Development_Team_2018_a; @Stan_Development_Team_2018_b; @Stan_Development_Team_2018_c].

# Data
As explained in a previous data munging document document
(https://github.com/USEPA/DissolvedGasNla/blob/master/scripts/dgIndicatorAnalysis.html), duplicate dissolved gas samples were collected at a depth of ~0.1m at designated index sites distributed across 1091 lakes nationwide, of which 95 were sampled twice as repeat visits. This randomly selected subset of revisit sites was used as a test set for assessing model fit and out-of-sample performance. 

Gas samples were analyzed via gas chromotography and concentrations were recorded to the nearest 0.001 nmol/L. The samples were collected under a stratified, unequal probability design and each gas observation was indexed to an individual lake selected with unequal probability from 5 different lake size categories, $j \in j=1,...,J = 5$, according to surface area (ha), and from within a state, $k \in k=1,...,K = 48$, situated within an aggregated, WSA9 or Omernik ecoregion, $l \in l=1,...,L = 9$. All 9 WSA9 ecoregions were represented in the sample, including Xeric (XER), Western Mountain (WMT),  Northern Plains (NPL), Southern Plains (SPL), Temperate Plains (TPL), Coastal Plains (CPL), Upper Midwest (UMW), Northern Appalachian (NAP), and Southern Appalachian (SAP) regions. As shown below, the data from the initial and revisit samples were separately compiled into data frame objects in $\textbf{R}$, with $n=984$ and $n=95$ rows, respectively, of gas observations indexed to the survey design variables and several potentially relevant covariates. 

## Import
The gas data and covariates were previously described and munged at 
https://github.com/USEPA/DissolvedGasNla/blob/master/scripts/dataMunge.html. That dataset was imported below. 
```{r import_data, eval=FALSE, include=TRUE}
load( file = paste0( localPath,
              "/Environmental Protection Agency (EPA)/",
              "ORD NLA17 Dissolved Gas - Documents/",
              "inputData/dg.2021-02-01.RData")
      )

save(dg, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda") 
```

From the imported dataset, a new data frame for modeling was constructed from the original file including only the variables of interest: (1) the N2O gas observations; (2) the survey design variables indexed to those observations; and (3) additional covariates considered potentially useful for improving the fit of the model. The data frame below excluded the second-visit observations, which would later be used for model checking. Some variables from the imported data were renamed for convenience. In addition, the NO3 covariate was rounded according to the documented measurement precision. An alternative version of the NO3 covariate was also created in this step by log-transforming and re-coding it as an ordered factor with five levels at hand-drawn cut points. The left-most cut point separated observations below the detection limit from the completely observed samples. The remaining cut points in the positive direction were drawn at approximately equal distances along the log scale. Finally, it should be noted that one lake that was sampled was missing information on the N2O gas measurements and it was removed from the data frame.
```{r model_data, echo=TRUE, paged.print=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda")

dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>% # 1 obs with missing measurement
  nrow() # number of observations before filtering

df_model <- dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>%
  filter(sitetype == "PROB") %>% # probability samples only
  filter(visit.no == 1) %>%
  mutate(n2o = round(dissolved.n2o.nmol, 2),
         n2o_eq = round(sat.n2o.nmol, 2),
         n2o_sat = n2o.sat.ratio,
         n2o_em = e.n2o.nmol.d,
         n2o_flux = f.n2o.m.d,
         WSA9 = factor(ag.eco9),
         state = factor(state.abb[match(state.nm, state.name)]),
         area_ha = area.ha,
         log_area = log(area_ha),
         chla = chla.result,
         log_chla = log(chla),
         elev = elevation,
         log_elev = log(elev + 1),
         do_surf = o2.surf,
         log_do = log(do_surf),
         bf_max = max.bf,
         sqrt_bf = sqrt(bf_max),
         size_cat = recode(area.cat6, 
                           "(1,4]" = "min_4" ,
                           "(10,20]" = "10_20",
                           "(20,50]" = "20_50",
                           "(4,10]" = "4_10",
                           ">50" = "50_max")) %>%
  mutate(size_cat = factor(size_cat,
                           levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
                           ordered = TRUE)) %>%
  mutate(no3 = ifelse(nitrate.n.result <= 0.0005, 0.0005, round(nitrate.n.result, 4))) %>%# 1/2 mdl 0.01
  mutate(no3_cat = cut(log(no3), # convert no3 to ordered factor with 5 levels
                       breaks = c(-Inf, -7.5, -5.5, -3.5, -1.5, Inf),
                       labels =seq(1, 5, 1))) %>%
  mutate(no3_cat = factor(no3_cat,
                          levels = seq(1, 5, 1),
                          ordered = TRUE)) %>%
  mutate(date = as.Date(date.col)) %>%
  mutate(jdate = as.numeric(format(date, "%j"))) %>% 
  mutate(lat = map.lat.dd,
         lon = map.lon.dd) %>% # longitude
  mutate(surftemp = surftemp,
         log_surftemp = log(surftemp)) %>% 
  select(WSA9,
         state,
         size_cat,
         site.id,
         lat,
         lon,
         date,
         jdate,
         surftemp,
         log_surftemp,
         area_ha,
         log_area,
         elev,
         log_elev,
         chla,
         log_chla,
         do_surf,
         log_do,
         bf_max,
         sqrt_bf,
         n2o,
         n2o_eq,
         no3,
         no3_cat
         )

save(df_model, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda") 

nrow(df_model) # number of obs after filtering

print(df_model)
```

A second dataframe, including only the second visit observations, was constructed below. These data were later used as a "test set" to assess the out-of-sample fit of the model developed on the first-visit or training data.
```{r test_data, echo=TRUE, paged.print=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/dg.rda")

# number of observations before filtering probability samples
dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>% # remove obs with missing response measurements
  nrow()

df_test <- dg %>%
  filter(!is.na(dissolved.n2o.nmol)) %>%
  filter(sitetype == "PROB") %>% # probability samples only
  filter(visit.no == 2) %>%
  mutate(n2o = round(dissolved.n2o.nmol, 2),
         n2o_eq = round(sat.n2o.nmol, 2),
         n2o_sat = n2o.sat.ratio,
         n2o_em = e.n2o.nmol.d,
         n2o_flux = f.n2o.m.d,
         WSA9 = factor(ag.eco9),
         state = factor(state.abb[match(state.nm, state.name)]),
         area_ha = area.ha,
         log_area = log(area_ha),
         chla = chla.result,
         log_chla = log(chla),
         elev = elevation,
         log_elev = log(elev + 1),
         do_surf = o2.surf,
         log_do = log(do_surf),
         bf_max = max.bf,
         sqrt_bf = sqrt(bf_max),
         size_cat = recode(area.cat6, 
                           "(1,4]" = "min_4" ,
                           "(10,20]" = "10_20",
                           "(20,50]" = "20_50",
                           "(4,10]" = "4_10",
                           ">50" = "50_max")) %>%
  mutate(size_cat = factor(size_cat,
                           levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
                           ordered = TRUE)) %>%
  mutate(no3 = ifelse(nitrate.n.result <= 0.0005, 0.0005, round(nitrate.n.result, 4))) %>%# 1/2 mdl 0.01
  mutate(no3_cat = cut(log(no3), # convert no3 to ordered factor with 5 levels
                       breaks = c(-Inf, -7.5, -5.5, -3.5, -1.5, Inf),
                       labels =seq(1, 5, 1))) %>%
  mutate(no3_cat = factor(no3_cat,
                          levels = seq(1, 5, 1),
                          ordered = TRUE)) %>%
  mutate(date = as.Date(date.col)) %>%
  mutate(jdate = as.numeric(format(date, "%j"))) %>% 
  mutate(lat = map.lat.dd,
         lon = map.lon.dd) %>% # longitude
  mutate(surftemp = surftemp,
         log_surftemp = log(surftemp)) %>% 
  select(WSA9,
         state,
         size_cat,
         site.id,
         lat,
         lon,
         date,
         jdate,
         surftemp,
         log_surftemp,
         area_ha,
         log_area,
         elev,
         log_elev,
         chla,
         log_chla,
         do_surf,
         log_do,
         bf_max,
         sqrt_bf,
         n2o,
         n2o_eq,
         no3,
         no3_cat
         )

save(df_test, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_test.rda") 

nrow(df_test) # number of obs after filtering for probability samples, first visits, and removing one site missing ecoregion (WSA9) info.

print(df_test)
```

## Target population
Below. the NLA sampling frame was imported and then filtered to include only the target population or sampling frame for this project.
```{r import_sample_frame, echo=TRUE}
df_pop <- read.csv(file = paste0(localPath,
              "/Environmental Protection Agency (EPA)/",
              "ORD NLA17 Dissolved Gas - Documents/",
              "inputData/NLA_Sample_Frame.csv"), header = T)

sframe <- df_pop %>%
  filter(nla17_sf != "Exclude2017") %>%
  filter(nla17_sf != "Exclude2017_Include2017NH") %>%
  filter(state != "DC") %>%
  filter(state != "HI") %>%
  droplevels() %>%
  mutate(WSA9 = factor(ag_eco9),
         WSA9 = forcats::fct_drop(WSA9), # remove NA level
         state = factor(state),
         size_cat = factor(area_cat6),
         lat = lat_dd83,
         lon = lon_dd83,
         log_area = log(area_ha),
         elev = elevation,
         log_elev = ifelse(elev <= 0, 0, elev), # assumed elev < 0 to be elev = 0
         log_elev = log(log_elev + 1)
         ) %>% 
  mutate(size_cat = recode(size_cat, 
                           "(1,4]" = "min_4" ,
                           "(10,20]" = "10_20",
                           "(20,50]" = "20_50",
                           "(4,10]" = "4_10",
                           ">50" = "50_max")) %>%
  mutate(size_cat = factor(size_cat, 
                           levels = c("min_4", "4_10", "10_20", "20_50", "50_max"),
                           ordered = TRUE)) %>%
  select(WSA9, state, size_cat, lat, lon, area_ha, log_area, elev, log_elev)

rm(df_pop)

save(sframe, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda") 

print(sframe)
```

The resulting target population above included a total of 465,897 waterbodies.

Cross tabulations below describe the structure of the target population with respect to the design variables. The cross-tabulation makes it clear that each ecoregion does not contain each state. Therefore, in the statistical sense, states were nested in ecoregions.
```{r frame_dimensions_1, echo=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")

sframe %>%
  group_by(WSA9, state) %>%
  summarise(n = n(), .groups = "drop") %>%
  spread(state, n) %>%
  print()
```

Likewise, lake size category was nested in state (which was nested in ecoregion). That is, not every ecoregion:state in the population of interest contained every size category (below).
```{r frame_dimensions_4, echo=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")

sframe %>%
  group_by(WSA9, state, size_cat) %>%
  summarise(n = n(), .groups = "drop") %>%
  spread(size_cat, n) %>%
  print()
```

Below, the sampling frame was selected down to create a post-stratification table. Some of the variables were renamed to match the naming conventions used in the observational data above. There were 536 types of lakes in the population of interest with respect to the sampling design. The counts of those lake types (n_lakes) and their proportions relative to the total population of lakes in the sampling frame (prop_cell) are indicated below.
```{r filter_frame, echo=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")

pframe <- sframe %>%
  mutate(obs = 1) %>%
  group_by(WSA9, state, size_cat) %>%
  summarise(n_lakes = sum(obs), .groups = "drop") %>%
  ungroup() %>%
  mutate(prop_cell = n_lakes/sum(n_lakes)) %>%
  mutate(type = "population") 

save(pframe, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

print(pframe)
```

## Sample vs. population
Below, the lake distributions in the population of interest were compared to the proportions in the observed sample. There were 352 lake types in the sample compared to the 536 in the population of of interest. In total, there were 984 observations distributed across these 352 lake types in the sample; and the number of samples was not distributed evenly across the types. Some cells were represented by as few as 1 lake. In total, 536-352 = 184 lake types in the population of interest were not represented in the sample.
```{r sample_cell_counts, echo=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

samp_props <- df_model %>%
  mutate(obs = 1) %>%
  group_by(WSA9, state, size_cat) %>%
  summarize(n_lakes = sum(obs), .groups = "drop") %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes / sum(n_lakes), 7)) %>%
  mutate(type = "sample") 

save(samp_props, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

print(samp_props)
```

Below, a graphical comparison was constructed to depict the distribution of cells in the population of interest _versus_ those in the sample.
```{r compare_sample_pop_cells, echo=FALSE, fig.align='center', fig.height=4, fig.width=10}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

pframe %>%
  bind_rows(samp_props) %>%
  ggplot(aes(x = interaction(WSA9, state, size_cat), y = prop_cell, group = type, linetype = type)) +
  geom_point(stat = "identity", aes( shape = type, color = type)) +
  geom_line() +
  theme_tidybayes() +
  theme(axis.text.x = element_blank()) +
  xlab("WSA9:state:size") +
  ylab("proportion in cell")
```

Another comparison between population and sample was constructed below by ecoregion. The samples were not balanced across ecoregions. Lakes in the Coastal Plains (CPL) ecoregion, for example, were clearly undersampled relative to their proportion of the population.
```{r eco_props_pop, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

pframe_eco <- pframe %>%
  group_by(WSA9) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'population') 

save(pframe_eco, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_eco.rda")
```

```{r eco_props_sample, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

samp_props_eco <- samp_props %>%
  group_by(WSA9) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'sample')

save(samp_props_eco, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_eco.rda")
```

```{r compare_eco_sample_pop_cells, echo=FALSE, fig.align='center', fig.height=4, fig.width=6}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_eco.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_eco.rda")

pframe_eco %>%
  bind_rows(samp_props_eco) %>%
  ggplot(mapping = aes(x = WSA9, y = prop_cell, group = type, linetype = type)) +
  geom_point(stat = "identity", aes( shape = type, color = type), size = 3) +
  geom_line() +
  theme_tidybayes() +
  xlab("Ecoregion") +
  ylab("proportion in cell") + 
  theme(legend.position = "top",
        legend.title = element_blank(),
        legend.text = element_text(size = 14)) +
  theme(text = element_text(size = 12))
```

A similar comparison by state was constructed below.
```{r state_props_pop, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

pframe_state <- pframe %>%
  group_by(state) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'population')

save(pframe_state, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_state.rda")
```

```{r state_props_sample, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

samp_props_state <- samp_props %>%
  group_by(state) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'sample')

save(samp_props_state, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_state.rda")
```

```{r compare_state_sample_pop_cells, echo=FALSE, fig.align='center', fig.height=4, fig.width=10}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_state.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_state.rda")

pframe_state %>%
  bind_rows(samp_props_state) %>%
  ggplot(mapping = aes(x = state, y = prop_cell, group = type, linetype = type)) +
  geom_point(stat = "identity", aes(shape = type, color = type)) +
  geom_line() +
  theme_tidybayes() +
  theme(axis.text.x = element_text(angle = 45)) +
  xlab("State") +
  ylab("proportion in cell")
```

Finally, a comparison by lake size category is shown below. Note that small lakes were under-sampled relative to larger lakes by design.
```{r size_props_pop, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe.rda")

pframe_size <- pframe %>%
  group_by(size_cat) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'population')

save(pframe_size, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_size.rda")
```

```{r size_props_sample, eval=FALSE, include=TRUE, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props.rda")

samp_props_size <- samp_props %>%
  group_by(size_cat) %>%
  summarise(n_lakes = sum(n_lakes)) %>%
  ungroup() %>%
  mutate(prop_cell = round(n_lakes/sum(n_lakes), 7)) %>%
  ungroup() %>%
  mutate(type = 'sample')

save(samp_props_size, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_size.rda")
```

```{r compare_size_sample_pop_cells, echo=FALSE, fig.align='center', fig.height=4, fig.width=6}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/pframe_size.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/samp_props_size.rda")

pframe_size %>%
  bind_rows(samp_props_size) %>%
  ggplot(mapping = aes(x = size_cat, y = prop_cell, group = type, linetype = type)) +
  geom_point(stat = "identity", aes( shape = type, color = type)) +
  geom_line() +
  theme_tidybayes() +
  xlab("Size category") +
  ylab("proportion in cell")
```

## Sample-based estimates
The overall mean and standard deviation for N2O in the sample:
```{r sample_summary_n2o}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(mean = mean(n2o),
             sd = sd(n2o)) %>%
  print()
```

The same summary for equilibrium N2O:
```{r sample_summary_n2oeq}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(mean = mean(n2o_eq),
             sd = sd(n2o_eq)) %>%
  print()
```

The saturation ratio (i.e., N2O / N2O-eq):
```{r sample_summary_n2osat}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(mean = mean(n2o / n2o_eq),
             sd = sd(n2o / n2o_eq)) %>%
  print()
```

Finally, roughly 67% of lakes in the sample were undersaturated (i.e., saturation ratio < 1):
```{r sample_summary_propsat}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  summarise(prop_undersat = sum((n2o / n2o_eq) < 1) / 984) %>%
  print()
```

Using only the sample observations, a plot was constructed below of the overall mean (dashed line) along with the ecoregion-specific means (black circles). The shaded areas indicate +/- 1 standard deviation. Neither dissolved N2O nor the saturation ratio were clearly structured by ecoregion in the sample, but there did appear to be some structure in the equilibrium N2O observations.
```{r sample_summary_eco, echo=FALSE, fig.align='center', fig.height=6, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

p1 <- df_model %>%
  group_by(WSA9) %>%
  summarise(mean = mean(n2o),
             sd = sd( n2o)) %>%
  ggplot(aes(x = WSA9, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = WSA9), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = WSA9, y = mean))+
  geom_point()+
  geom_hline(yintercept = 8.72, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) + 
  ylab("") +
  ggtitle("N2O")

p2 <- df_model %>%
  group_by(WSA9) %>%
  summarise(mean = mean(n2o_eq),
             sd = sd(n2o_eq)) %>%
  ggplot(aes(x = WSA9, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = WSA9), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = WSA9, y = mean))+
  geom_point()+
  geom_hline(yintercept = 7.48, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) +
  ylab("Sample mean") +
  ggtitle("N2O equilibrium")

p3 <- df_model %>%
  group_by(WSA9) %>%
  summarise(mean = mean(n2o / n2o_eq),
             sd = sd(n2o / n2o_eq)) %>%
  ggplot(aes(x = WSA9, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = WSA9), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = WSA9, y = mean))+
  geom_point()+
  geom_hline(yintercept = 1.17, linetype = 'dashed') +
  theme_tidybayes() +  
  xlab("Ecoregion") +
  ylab("") +
  ggtitle("N2O saturation ratio")

grid.arrange(p1, p2, p3)
```

The same summary by state is below.
```{r sample_summary_state, echo=FALSE, fig.align='center', fig.height=6, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

p1 <- df_model %>%
  group_by(state) %>%
  summarise(mean = mean(n2o),
             sd = sd( n2o)) %>%
  ggplot(aes(x = state, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = state), 
              fill = 'lightgrey', 
              alpha = .7) +
  geom_line(aes(x = state, y = mean))+
  geom_point()+
  geom_hline(yintercept = 8.72, linetype = 'dashed') + 
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) + 
  ylab("") +
  ggtitle("N2O")

p2 <- df_model %>%
  group_by(state) %>%
  summarise(mean = mean(n2o_eq),
             sd = sd(n2o_eq)) %>%
  ggplot(aes(x = state, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = state), 
              fill = 'lightgrey', 
              alpha = .7) +
  geom_line(aes(x = state, y = mean))+
  geom_point()+
  geom_hline(yintercept = 7.48, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) +
  ylab("Sample mean") +
  ggtitle("N2O equilibrium")

p3 <- df_model %>%
  group_by(state) %>%
  summarise(mean = mean(n2o / n2o_eq),
             sd = sd(n2o / n2o_eq)) %>%
  ggplot(aes(x = state, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = state), 
              fill = 'lightgrey', 
              alpha = .7)+
  geom_line(aes(x = state, y = mean))+
  geom_point()+
  geom_hline(yintercept = 1.17, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.text.x = element_text(angle = 45)) + 
  xlab("State") +
  ylab("") +
  ggtitle("N2O saturation ratio")

grid.arrange(p1, p2, p3)
```

Finally, the same summary by size category:
```{r sample_summary_size, echo=FALSE, fig.align='center', fig.height=6, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

p1 <- df_model %>%
  group_by(size_cat) %>%
  summarise(mean = mean(n2o),
             sd = sd( n2o)) %>%
  ggplot(aes(x = size_cat, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = size_cat), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = size_cat, y = mean))+
  geom_point()+
  geom_hline(yintercept = 8.72, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) + 
  ylab("") +
  ggtitle("N2O")

p2 <- df_model %>%
  group_by(size_cat) %>%
  summarise(mean = mean(n2o_eq),
             sd = sd(n2o_eq)) %>%
  ggplot(aes(x = size_cat, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = size_cat), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = size_cat, y = mean))+
  geom_point()+
  geom_hline(yintercept = 7.48, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank()) +
  ylab("Sample mean") +
  ggtitle("N2O equilibrium")

p3 <- df_model %>%
  group_by(size_cat) %>%
  summarise(mean = mean(n2o / n2o_eq),
             sd = sd(n2o / n2o_eq)) %>%
  ggplot(aes(x = size_cat, y = mean, group = 1)) +
  geom_ribbon(aes(ymin = mean - sd, ymax = mean + sd, x = size_cat), fill = 'lightgrey', alpha = .7)+
  geom_line(aes(x = size_cat, y = mean))+
  geom_point()+
  geom_hline(yintercept = 1.17, linetype = 'dashed') +
  theme_tidybayes() + 
  theme(axis.text.x = element_text(angle = 45)) + 
  xlab("Size class") +
  ylab("") +
  ggtitle("N2O saturation ratio")

grid.arrange(p1, p2, p3)
```

## Sample data exploration
Below, the empirical distribution of N2O observations in the sample was summarized using a density and rug plot below. Note the natural log scale of the x axis. Both the N2O and equilibrium N2O data had considerable right skew even after the log transformation, which was not unexpected and has been noted in other studies [@Webb_etal_2019]. The saturation ratio was also skewed since it was derived from the other two observed variables (i.e., sat_ratio = n2o / n2o_eq).
```{r summary_N2O, echo=FALSE, fig.align='center', fig.height=8, fig.width=5, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

p1 <- df_model %>%
  ggplot(aes(x = n2o)) +
  geom_boxplot(aes(x = n2o, y = -0.5), outlier.shape = NA, alpha = 0.7) + 
  geom_density(aes(x = n2o)) +
  geom_rug(aes(x = n2o), show.legend = F ) +
  theme(text = element_text(size=12)) +
  scale_x_continuous(trans = "log", breaks = c(0, 5, 10, 25, 50, 150)) +
  ylab("density") +
  xlab("N2O (nmol/L)") +
  theme_tidybayes() +
  theme(axis.text.y = element_blank())

p2 <- df_model %>%
  ggplot(aes(x = n2o_eq)) +
  geom_boxplot(aes(x = n2o_eq, y = -0.5), outlier.shape = NA, alpha = 0.7) + 
  geom_density(aes(x = n2o_eq)) +
  geom_rug(aes(x = n2o_eq), show.legend = F ) +
  theme(text = element_text(size=12)) +
  scale_x_continuous(trans = "log") +
  ylab("density") +
  xlab("Equilibrium N2O (nmol/L)") +
  theme_tidybayes() +
  theme(axis.text.y = element_blank())

p3 <- df_model %>%
  ggplot(aes(x = n2o / n2o_eq)) +
  geom_boxplot(aes(x = n2o / n2o_eq, y = -0.5), outlier.shape = NA, alpha = 0.7) + 
  geom_density(aes(x = n2o / n2o_eq)) +
  geom_rug(aes(x = n2o / n2o_eq), show.legend = F ) +
  theme(text = element_text(size=12)) +
  scale_x_continuous(trans = "log", breaks = c(0, 1, 5, 10, 20)) +
  ylab("density") +
  xlab("N2O saturation ratio") +
  theme_tidybayes() +
  theme(axis.text.y = element_blank())

grid.arrange(p1, p2, p3)
```

Below are plots of N2O vs. NO3. The first plot shows log(N2O) vs. log(NO3), as well as the ordinal categories assigned to NO3 (vertical lines). The leftmost vertical line is dashed and separates the NO3 observations below the detection limit.
```{r summary_N2O_vs_NO3, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = log(no3), y = log(n2o))) +
  geom_point(show.legend = F) +
  geom_smooth(method = "loess", span = 2) +
  geom_vline(xintercept = -7.5, linetype = "dashed") +
  geom_vline(xintercept = -5.5) +
  geom_vline(xintercept = -3.5) +
  geom_vline(xintercept = -1.5) +
  theme(text = element_text(size=12)) +
  theme_bw()
```

In the plot above, the trend is increasing and nonlinear on the log scale. The increasing variance in N2O along the NO3 gradeient suggested a potential mediator of the relationship between NO3 on N2O. Below are plots of N2O vs. NO3 for 6 quantiles of the surface temperature measurements (quantiles increasing from 1 to 6). This plot below suggested that the NO3 effect on N2O may have been stronger in lakes with higher observed temperatures.
```{r summary_N2O_vs_NO3_surftemp, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = log(no3), y = log(n2o))) +
  geom_point(show.legend = F) +
  geom_smooth(method = "loess", span = 2) +
  theme(text = element_text(size=12)) +
  facet_wrap(~ as.factor(ntile(surftemp, 6))) +
  theme_bw()
```

The next plot below shows the relationship between N2O and NO3 at 6 different quantiles (increasing 1 to 6) of the log-scaled lake surface area estimates.
```{r summary_N2O_vs_NO3_logarea, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = log(no3), y = log(n2o))) +
  geom_point(show.legend = F) +
  geom_smooth(method = "loess", span = 2) +
  theme(text = element_text(size=12)) +
  facet_wrap(~ as.factor(ntile(log_area, 6))) +
  theme_bw()
```

Similar plots are below, but with NO3 expressed as an ordered categorical variable with 5 levels. The positive and monotonic trends area similar to the previous plots where NO3 was treated as continuous. Note the large number of observations in the first NO3 category (no3_cat = 1). This category represented all of the censored observations for NO3, which was most of the data.
```{r summary_N2O_vs_NO3cat, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = no3_cat, y = log(n2o), color = 1)) +
  geom_point( position = position_jitterdodge(), show.legend = F ) +
  geom_boxplot(outlier.shape = NA, notch = TRUE, color = "black", alpha = 0.7) +
  theme(text = element_text(size=12)) +
  theme_bw()
```

```{r summary_N2O_vs_NO3cat_surftemp, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = no3_cat, y = log(n2o), color = 1)) +
  geom_point( position = position_jitterdodge(), show.legend = F ) +
  geom_boxplot(outlier.shape = NA, color = "black", alpha = 0.7) +
  theme(text = element_text(size=12)) +
  facet_wrap(~ as.factor(ntile(surftemp, 6))) +
  theme_bw()
```

```{r summary_N2O_vs_NO3cat_logarea, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = no3_cat, y = log(n2o), color = 1)) +
  geom_point( position = position_jitterdodge(), show.legend = F ) +
  geom_boxplot(outlier.shape = NA, color = "black", alpha = 0.7) +
  theme(text = element_text(size=12)) +
  facet_wrap(~ as.factor(ntile(log_area, 6))) +
  theme_bw()
```

Below is a plot of log(N2O) vs. log(NO3) by ecoregion, which suggested that the NO3 effect on N2O may have varied by ecoregion.
```{r summary_N2O_vs_NO3_ecoregion, echo=FALSE, fig.align='center', fig.height=6, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = log(no3), y = log(n2o))) +
  geom_point(show.legend = F) +
  geom_smooth(method = "loess", span = 2) +
  geom_vline(xintercept = -7.5, linetype = "dashed") +
  geom_vline(xintercept = -5.5) +
  geom_vline(xintercept = -3.5) +
  geom_vline(xintercept = -1.5) +
  theme(text = element_text(size=12)) +
  facet_wrap(~ WSA9) +
  theme_bw()
```

Below is the same plot as above but for the ordered categorical version of NO3.
```{r summary_N2O_vs_NO3cat_ecoregion, echo=FALSE, fig.align='center', fig.height=6, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  ggplot(aes(x = no3_cat, y = log(n2o), color = 1)) +
  geom_point( position = position_jitterdodge(), show.legend = F ) +
  geom_boxplot(aes(x = no3_cat, y = log(n2o)), 
               outlier.shape = NA, 
               color = "black", 
               alpha = 0.7) + 
  facet_wrap(~ WSA9) +
  theme(text = element_text(size=12)) +
  theme_bw()
```

A plot below shows trends by state within just the Temperate Plains (TPL) ecoregion. Within states, the number of observations were relatively small, but the trends appeared closer to linear.
```{r summary_N2O_vs_NO3_wsa9state3, echo=FALSE, fig.align='center', fig.height=4, fig.width=8, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

df_model %>%
  filter(WSA9 == "TPL") %>%
  ggplot(aes(x = log(no3), y = log(n2o), group = state, color = state)) +
  geom_point(show.legend = F) +
  geom_smooth(method = "lm", span = 2, alpha = 0.1) +
  theme(text = element_text(size=12)) +
  theme_bw()
```

# Model fitting
The first regression model was constructed to estimate the joint distribution of log-transformed N2O and equilibrium N2O conditional on the the design factors. Each log-transformed observation, $i \in 1,..,N=984$, for each response, $p \in 1:P=2$, was assumed to be drawn from a multivariate normal distribution with the parameters $\nu$ and $\Sigma$, where $\nu$ is the multivariate mean estimated conditional on the design effects and $\Sigma$ is a covariance matrix containing the observation-level variances and residual correlation:
$$Y \sim MVN(\nu, \Sigma)$$

The multivariate mean is a vector of mean parameters, $\nu:[\mu_{p=1}, \mu_{p=2}]$, for each response. Each mean is further defined by a linear combination of parameters where, for each response $p$ and observation $i$:

$$\mu_{pi} = \alpha_{0(pi)} + \alpha_{1(pij)} + \alpha_{2(pijk)} + \alpha_{3(pijkl)} \\
\alpha_1 \sim MVN(0, \Lambda_1) \\
\alpha_2 \sim MVN(0, \Lambda_2) \\
\alpha_3 \sim MVN(0, \Lambda_3)$$

The linear combination of parameters defining $\mu$ above include a fixed global intercept, $a_0$, that is estimated directly from the data, and three separate, latent group-level effects matrices, $\alpha_1, \alpha_2, \alpha_3$. The group effects were assumed to be multivariate normal and are centered on zero in multivariate space. The spread of the effects around zero are determined by a covariance matrix, $\Lambda_1, \Lambda_2, \text{or } \Lambda_3$, which are estimated directly from the data. These covariance terms are further defined where:

$$\Lambda = \begin{pmatrix} 1 & \tau_{p=1} \\ \tau_{p=2} & 1 \end{pmatrix} \chi \begin{pmatrix} 1 & \tau_{p=1} \\ \tau_{p=2} & 1 \end{pmatrix}$$

The $\tau$ parameters are the group-level scale parameters, which constrain the spread of effects for each response, and $\chi$ comprises the group-level residual correlation matrix:

$$\chi = \begin{pmatrix} 1 & \varrho \\ \varrho & 1 \end{pmatrix}$$

wherein $\varrho$ is the group-level residual correlation between responses.

The explicit indexing in the notation above conveys the relationship between the parameters and each observation, $i$, and emphasizes the nested structure of the observations within the group effects. Specifically, every observation, $i$, was nested in a lake size category, $l$, which was nested in a state, $k$, and ecoregion, $j$. The parameter $\alpha_1$, therefore, accounted for ecoregion-scale group effects or deviations from the global mean; $\alpha_2$ accounted for state-level group effects nested in ecoregions; and $\alpha_3$ accounted for lake size group effects within states and ecoregions.  

Finally, the observation-level covariance term, $\Sigma$, was parameterized as:
$$\Sigma = \begin{pmatrix} 1 & \sigma_{p=1} \\ \sigma_{p=2} & 1 \end{pmatrix} \Omega \begin{pmatrix} 1 & \sigma_{p=1} \\ \sigma_{p=2} & 1 \end{pmatrix}$$

wherein the $\sigma$ parameters are the observation-level standard deviations for each response and $\Omega$ comprises the observation-level residual correlation matrix:
$$\Omega = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}$$
wherein $\rho$ is the residual correlation between responses.

For model fitting, priors were needed for all parameters conditioned directly on the data, which included the global intercept, the scale parameters, and the correlation matrices. A normal or Gaussian prior, $N(\mu = 2, \sigma = 1)$ centered near the (log-scale) data means, was used for the global intercept parameter for each response. This prior was considered minimally informative as it placed most (~80%) of the prior mass over values between approximately 2 and 27 ng/L for median N2O or N2O equilibrium concentration and included support in the tails for values approaching 0 ng/L on the lower end and 80 ng/L on the high end. We placed $Exp(2)$ priors over all scale parameters, which placed most of the support between values very close to 0 and values near 1 (central 80% density interval from approximately 0.005 to 1.15). Finally, for the correlation matrices, an $LKJ(\eta =2)$ prior was used, which, for a 2-dimensional response, placed most support for correlations between approximately -0.9 and 0.9. This prior seemed reasonable as there was no clear causal mechanisms that were thought to ensure a strong direct correlation between the N2O measures. Any potential residual dependence was expected to be indirect due to, for example, a common causal factor (e.g., elevation, temperature). For more information on prior choice recommendations in Stan, see: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

The $\textbf{brms}$ package [@Burkner_2017] for $\textbf{R}$ [@R_Core_Team_2021] was used to fit all of the models in a fully Bayesian setting. The formula syntax of the $\textbf{brms}$ package is similar to the syntax used in the $\textbf{lme4}$ package that is widely used to fit mixed effects models in frequentist settings In either package, the linear predictor for $\mu$ described above could be expressed as:

$$\sim 1 + (1|WSA9) + (1|WSA9:state) + (1|WSA9:state:size)$$

In the $\textbf{brms}$ package, there is additionaly functionality and syntax for multivariate responses and for allowing the varying intercepts in a multivariate model to be correlated, e.g.,:

$$ N_2O_{dissolved}\sim 1 + (1|a|WSA9) + (1|b|WSA9:state) + (1|c|WSA9:state:size) \\ 
N_2O_{equilibrium}\sim 1 + (1|a|WSA9) + (1|b|WSA9:state) + (1|c|WSA9:state:size)$$

The above syntax would indicate that the linear predictor for both responses in the multivariate model have the same group-level varying effects, and that each of those effects are allowed to be correlated between responses. 

For the remainder of this document, only this simplified syntax is presented to describe the model parameterizations. For more information on $\textbf{brms}$ functionality and syntax with multivariate response models, the package vignette may be helpful, and can be found at: https://cran.r-project.org/web/packages/brms/vignettes/brms_multivariate.html.

## Model 1
The first model fit was the one described above.
```{r n2o_mod_mv_1, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ 1 + 
               (1 | a | WSA9) + 
               (1 | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ 1 + 
               (1 | a | WSA9) + 
               (1 | b | WSA9:state) +
               (1 | c | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"), # centered near data mean
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(exponential(2), class = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), # centered near data mean
  prior(exponential(2), class = "sd", resp = "logn2oeq"),
  prior(exponential(2), class = "sigma", resp = "logn2oeq"),
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod1 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model,
                prior = priors,
                control = list(adapt_delta = 0.99, max_treedepth = 14),
                #sample_prior = "only",
                save_pars = save_pars(all = TRUE),
                seed = 145,
                chains=4, 
                iter=5000, 
                cores=4)

save(n2o_mod1, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
```


### Summarize fit
The summaries of the estimated parameters and key HMC convergence diagnostics for the fitted model are printed below. There were no obvious issues with the HMC sampling. All $\hat{R}$ values were less than 1.01 and effective sample size ($ESS$) calculations suggested that the posterior contained a sufficient number of effective samples for conducting inference.
```{r print_mod1, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")

print(n2o_mod1, prior = T)
```
In the summary above, the estimated standard deviations for the varying group effects on the mean behavior of the dissolved N2O response suggested fairly low, but non-zero variability across each of the three levels. The standard deviations estimated for the same varying effects for equilibrium N2O were also relatively small. Finally, note the relatively small, but positive residual correlation between the two N2O responses. 

Before investing too much into the interpretation of this model, however, the model fit was evaluated below using a series of graphical posterior predictive checks [PPC; @Gelman_etal_2014; @Gelman_etal_2020, Ch. 11].

### Model checks
#### Dissolved N2O
Below are a series of panels illustrating graphical PPCs for the log(N2O) component of the model. The top left panel compares a density plot of the observed data (black line) to density lines drawn for 200 samples from the posterior predictive distribution (PPD; blue lines) of the fitted model. The top right panel similarly compares the cumulative density distributions. The left middle panel simulataneously compares means _vs._ standard deviations for 1000 draws from the PPD (blue dots) to the sample mean and standard deviation (black dot). The right middle panel compares skewness _vs._ kurtosis for 1000 draws from the PPD to the skewness and kurtosis values calculated for the observed data. The bottom left panel compares max _vs._ min values for 1000 draws from the PPD to the max and min values of the sample data. Finally, the bottom right panel shows the observed _vs._ average predicted values for each observation in the sample. The average predicted values were calculated as the mean prediction for each observation in the PPD based on 1000 draws.
```{r ppc_n2o1, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
grid.arrange(
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "ecdf_overlay",
         ndraws = 200, 
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2o",
         type = "scatter_avg",
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```
The general takeaway from the PPCs above was that the model replicated the central tendency of the observed data fairly well, but failed to sufficiently replicate other important aspects of the distribution, such as skewness and kurtosis. The observed _vs._ average predictions scatterplot suggested substantial heteroscedasticity in the errors. 

The same checks were run below, but for the test set of 95 held-out, second-visit data points. 
```{r ppc_n2o1_test, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_test.rda")
grid.arrange(
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "ecdf_overlay",
         ndraws = 200, 
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2o",
         type = "scatter_avg",
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```
The patterns in misfit indicated above for the re-visit data were similar to the patterns indicated in the PPCs with the training data.

#### Equilibrium N2O
Below are PPCs for the equilibrium N2O component of the model. As with the dissolved N2O response above, the model did an OK job at replicating the central tendency, but performed less well at replicating some important aspects of the overall distribution. 
```{r ppc_n2oeq1, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
grid.arrange(
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "ecdf_overlay",
         ndraws = 200, 
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         resp = "logn2oeq",
         type = "scatter_avg", 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

Below are the same PPCs for equilibrium N2O in the re-visit sites.
```{r ppc_n2oeq1_test, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
grid.arrange(
pp_check(n2o_mod1,
         newdata = df_test,
         resp = "logn2oeq",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2oeq",
         type = "ecdf_overlay",
         ndraws = 200, 
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod1, 
         newdata = df_test,
         resp = "logn2oeq",
         type = "scatter_avg", 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```
#### Bivariate
The graphical check below compares bivariate density contours estimated from the observed data (black lines) to density contours estimated for each of 20 draws from the PPD. The model appeared to do a good job of replicating the bivariate mean, but was poor at representing the overall joint distribution.
```{r ppc_biv1, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 20) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o), y = log(n2o_eq))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

The same bivariate check is shown below for the re-visit data.
```{r ppc_biv1_test, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 20) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o), y = log(n2o_eq))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation
The graphical PPCs below were aimed at evaluating how well the multivariate model did at representing the observed saturation ratio: 
$$N_2O_{dissolved}:N_2O_{equilibrium}$$
This quantity was estimated as a derived variable by simply dividing the N2O PPD by the equilibrium N2O PPD. Likewise, the proportion of under-saturated lakes in the sample was estimated by summing the number of lakes from each posterior predictive draw wherein the ratio was < 1 and dividing that number by the total number of lakes in the sample, which was 984.
Overall, these checks indicated that properly representing the tails of the N2O and N2O-eq observations would likely be necessary in order to better replicate the observed saturation metrics. For example, the model did a poor job replicating the observed proportion of under-saturated lakes, underestimating it by more than 10 percentage points, on average.
```{r ppc_sat1, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(logn2o < logn2oeq, 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

The top left panel, above, is a density plot of the observed saturation ratio (black line) compared to an estimate using 50 draws from the model (blue lines). The top right panel shows the observed proportion of under-saturated lakes compared to a model estimate based on 1000 draws from the PPD. The left middle panel shows the mean _vs._ standard deviation of the saturation ratio for the observed data compared to the same estimates for 500 posterior draws from the model's PPD. The right middle panel shows the max _vs._ min for the sample compared to 500 draws from the model's PPD. Finally, the bottom left panel shows the observed _vs._ average predicted saturation ratio for all 984 lakes sampled in the dataset.

The same PPCs are show below for the revisit data.
```{r ppc_sat1_test, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
grid.arrange(
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(logn2o < logn2oeq, 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 95) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_test, mapping = aes(xintercept = sum(n2o < n2o_eq)/95)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod1, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```
The checks above indicated that the model did a similarly underwhelming job of replicating some key properties of the saturation metrics calculated from the re-visit data.

#### R-square
Below, the Bayesian $R^2$ values are reported for each reasponse in the model. 
```{r r2_1, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
round(bayes_R2(n2o_mod1, resp = "logn2o", cores = 1), 3)
round(bayes_R2(n2o_mod1, resp = "logn2oeq", cores = 1), 3)
```

The $R^2$ were also estimated for the re-visit data.
```{r r2_1_test, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod1.rda")
round(bayes_R2(n2o_mod1, resp = "logn2o", newdata = df_test, cores = 1), 3)
round(bayes_R2(n2o_mod1, resp = "logn2oeq", newdata = df_test, cores = 1), 3)
```

## Model 2
In an attempt to better fit the observed data, the next model included distributional sub-models to allow for heterogeneous variances for each response conditional on the survey design structure.
```{r n2o_mod_2, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ 1 +
               (1 | a | WSA9) + 
               (1 | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ 1 +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod2 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 84512,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod2, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
```

### Summarize fit
The summaries of the estimated parameters and key HMC convergence diagnostics for the fitted model are printed below. 
```{r print_mod2, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")

print(n2o_mod2, prior = T)
```
From the summary above, note the moderate and positive residual correlation between the two N2O responses. The estimated standard deviations for the varying group effects on the mean behavior of the dissolved N2O response suggested fairly low, but non-zero variability across each of the three levels. The standard deviations estimated for the same varying effects for equilibrium N2O were also relatively small. However, before investing too much into the interpretation of these results, the model fit was further evaluated below using a series of graphical posterior predictive checks (PPCs).

### Model checks
Below the same PPCs were performed as with the initial model (see above for more details on each panel). 
##### Dissolved N2O
Though the checks below suggest some improvement in replicating the tails of the observed data, this model did a poorer job at replicating central tendency.
```{r ppc_n2o2, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
grid.arrange(
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlim(-5, 5) +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "ecdf_overlay",
         ndraws = 200, 
         cores = 1) + 
  theme_tidybayes() +
  xlim(-5, 5) +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"),
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2o",
         type = "scatter_avg", 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Equilibrium N2O
The checks below suggest this model offered no improvement upon the initial model for equilibrium N2O. This model also appeared to do a poorer job of replicating the mean and overall standard deviation compared to the initial model.
```{r ppc_n2oeq2, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
grid.arrange(
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("density")
,
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "ecdf_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         ndraws = 1000,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"),
         ndraws = 1000,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod2, 
         resp = "logn2oeq",
         type = "scatter_avg", 
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Bivariate
This check perhaps suggested an improvement with regard to replicating the joint density. However, the predictions were still clearly over-dispersed relative to the observations.
```{r ppc_biv2, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 20) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o_obs), y = log(n2oeq_obs))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation
The PPCs for the saturation metrics below indicated that including the distributional models was perhaps an improvement on the initial model in some aspects; in particular, the bias in the predicted proportion of under-saturated lakes was substantially decreased. However, there appeared to still be issues in replicating the tails as well as issues with central tendency.
```{r ppc_sat2, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(logn2o < logn2oeq, 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod2, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

#### R-square
Relative to model 1, there was a substantial decrease in the $R^2$ estimate for the dissolved N2O component of this model. The estimate for the equilibrium N2O-eq component was similar to the model 1. 
```{r r2_2, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod2.rda")
round(bayes_R2(n2o_mod2, resp = "logn2o", cores = 1), 3) 
round(bayes_R2(n2o_mod2, resp = "logn2oeq", cores = 1), 3)
```

## Model 3
In the next model, we used covariates to try to improve the fit. The categorical version of the NO3 covariate was used as a monotonic ordinal predictor in the dissolved N2O component of the modl. For the equlibrium N2O component, we included surface temperature and log-transformed elevation, along with their interaction. The models also retained the distributional specifications included in model 2 above.
```{r n2o_mod_3, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
               surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ 1 +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(normal(0, 1), class = "b", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(normal(0, 1), class = "b", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"), 
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod3 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 98456,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod3, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
```

### Summarize fit
The fitted parameters and MCMC diagnostics are below.
```{r print_mod3, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")

print(n2o_mod3, prior = T)
```

### Model checks
#### Dissolved N2O
The PPCs below indicated a better fit compared to the previous models. The central tendency and tail behavior looked to be reasonably replicated by comparison. However, the observed _vs._ predicted plot suggested that larger overserved values were being systematically underestimated.
```{r ppc_full_checks_mod_n2o3, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
grid.arrange(
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "dens_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlim(0, 5) +
  xlab(expression(paste("log(N"[2],"O)"))) + 
  ylab("density")
,
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "ecdf_overlay",
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlim(0, 5) +
  xlab(expression(paste("log(N"[2],"O)"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         ndraws = 1000,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"),  
         ndraws = 1000,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"),  
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2o",
         type = "scatter_avg",  
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Equilibrium N2O
The PPCs below indicated that this model appeared to be an improvement for equilibrium N2O as well. However, some checks (e.g., skewness) suggested some room for additional improvement.
```{r ppc_full_checks_mod_n2oeq3, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
grid.arrange(
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "dens_overlay", 
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("density")
,
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "ecdf_overlay", 
         ndraws = 200,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"),  
         ndraws = 1000,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         ndraws = 1000, 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"),  
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod3, 
         resp = "logn2oeq",
         type = "scatter_avg",  
         ndraws = 1000,
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Bivariate
The check for the joint distribution below also suggested an improvement up the previous models.
```{r ppc_bv_check_mod_n2o3, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 20) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o_obs), y = log(n2oeq_obs))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation
This model looked to be an improvement with regard to the PPCs for the saturation metrics. However, the proportion of under-saturated lakes remained biased low and other checks indicated that further improvements would be ideal.
```{r ppc_sat_check_mod_n2o3, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(exp(logn2o) < exp(logn2oeq), 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod3, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

#### R-square
The $R^2$ estimates for this model are below and suggested substantial improvements on the previous models.
```{r r2_mod_n2o3, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
round(bayes_R2(n2o_mod3, resp = "logn2o", cores = 1), 3) 
round(bayes_R2(n2o_mod3, resp = "logn2oeq", cores = 1), 3)
```

### Covariate effects
Below are plots illustrating the modeled effects of covariates on both N2O and equilibrium N2O.
#### N2O
The conditional effects plots below for N2O illustrate a positive, monotonic, and non-linear relationship between NO3 and N2O; and a negative, linear relationship between surface temperature and N2O.
```{r conditional_effects_mod_n2o3, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=3}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")

p1 <- conditional_effects(n2o_mod3, 
                          resp = "logn2o", 
                          effects = c("no3_cat"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod3, 
                          resp = "logn2o", 
                          effects = c("surftemp"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]], 
          plot(p2, plot = F)[[1]],
          ncol = 2)

rm(p1, p2)

annotate_figure(plt, top = text_grob("N2O: conditional effects", 
               color = "black", face = "bold", size = 14))
```

#### Equilibrium N2O
The modeled effects below for the equilibrium N2O component of the model illustrated a negative relationship between equilibrium N2O and both predictors and an interaction such that the surface temperature effect became slightly steeper at lower elevations.
```{r conditional_effects_mod_n2oeq3, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=3}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod3.rda")
p1 <- conditional_effects(n2o_mod3, 
                          resp = "logn2oeq", 
                          effects = c("log_elev:surftemp"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod3, 
                          resp = "logn2oeq", 
                          effects = c("surftemp:log_elev"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob("Equilibrium N2O: conditional effects", 
               color = "black", face = "bold", size = 14))
```

## Model 4
In the next model, covariate terms were also included in the $\sigma$ components of both models in order to try to better capture remaining heterogeneity in the variances of both N2O and N2O-eq.
```{r n2o_mod_mv_4, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
               surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ mo(no3_cat) +
               surftemp +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ surftemp +
               log_elev +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(normal(0, 1), class = "b", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(normal(0, 1), class = "b", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"), 
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod4 <- brm(bf_n2o + bf_n2oeq + set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 15851,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod4, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
```

### Summarize fit
Below is a summary of the fitted parameters along with some convergence diagnostics.
```{r print_mod4, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

print(n2o_mod4, prior = T, digits = 3)
```

### Model checks
Again, the same PPCs were employed for this model as above.
#### Dissolved N2O
Again, this model appeared to be an improvement on the previous model, particularly with regard to the more constant variance indicated in the observed _vs._ predicted plot (bottom, right panel).
```{r ppc_full_checks_mod_n2o4, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
grid.arrange(
pp_check(n2o_mod4, 
         ndraws = 200,
         resp = "logn2o",
         type = "dens_overlay",
         nsamples = 40,
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O)"))) + 
  ylab("density")
,
pp_check(n2o_mod4, 
         ndraws = 200,
         resp = "logn2o",
         type = "ecdf_overlay",
         nsamples = 40, 
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O)"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         resp = "logn2o",
         type = "scatter_avg", 
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Equilibrium N2O
This component of the model also seemed to be an improvement over model 3, with better representation in the tails as indicated in the skewness _vs._ kurtosis PPC. 
```{r ppc_full_checks_mod_n2oeq4, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
grid.arrange(
pp_check(n2o_mod4, 
         ndraws = 200,
         resp = "logn2oeq",
         type = "dens_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("density")
,
pp_check(n2o_mod4, 
         ndraws = 200,
         resp = "logn2oeq",
         type = "ecdf_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O)"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  theme_tidybayes()
,
pp_check(n2o_mod4, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "scatter_avg", 
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Bivariate
Again, an improvement over the previous model with a tighter fit of the PPC to the observed bivariate density.
```{r ppc_bv_check_mod_n2o4, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 20) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o_obs), y = log(n2oeq_obs))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation
This check also suggested an improvement over the previous models, with better tail behavior and less bias in the proportion under-saturated measure.
```{r ppc_sat_check_mod_n2o4, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(exp(logn2o) < exp(logn2oeq), 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod4, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

#### R-square
The Bayesian $R^2$ estimates below indicated an improvement from the previous models.
```{r r2_mod_n2o4, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")
round(bayes_R2(n2o_mod4, resp = "logn2o", cores = 1), 3) 
round(bayes_R2(n2o_mod4, resp = "logn2oeq", cores = 1), 3)
```

### Covariate effects
#### N2O
The conditional effects plots for the covariate effects on N2O remained largely unchanged from the previous model.
```{r conditional_effects_mod_n2o4, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

p1 <- conditional_effects(n2o_mod4, 
                          resp = "logn2o", 
                          effects = c("no3_cat"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod4, 
                          resp = "logn2o", 
                          effects = c("surftemp"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]], 
          plot(p2, plot = F)[[1]],
          ncol = 2)

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

Below are estimates of the conditional effects of the covariates on $\sigma$ for N2O. These plots suggested a large effect of NO3 on the variance of N2O, but little to no effect of surface temperature.
```{r conditional_effects_sigma_n2o4, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

p1 <- conditional_effects(n2o_mod4, 
                          resp = "logn2o",
                          dpar = "sigma",
                          effects = c("no3_cat"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod4, 
                          resp = "logn2o",
                          dpar = "sigma",
                          effects = c("surftemp"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, 
                top = text_grob(expression(paste("N2O: covariate effects on ", sigma)),
                                color = "black",
                                face = "bold",
                                size = 14))
```

#### Equilibrium N2O
The covariate effects on N2O remained largely the same as for the previous model.
```{r conditional_effects_mod_n2oeq4, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

p1 <- conditional_effects(n2o_mod4, 
                          resp = "logn2oeq", 
                          effects = c("log_elev:surftemp"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod4, 
                          resp = "logn2oeq", 
                          effects = c("surftemp:log_elev"), 
                          plot = F)
plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

The covariate effects on $\sigma$ for N2O-eq suggested an negative effect of surface temperature and litte to no effect of elevation.
```{r conditional_effects_sigma_n2oeq4, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod4.rda")

p1 <- conditional_effects(n2o_mod4, 
                          resp = "logn2oeq",
                          dpar = "sigma",
                          effects = c("surftemp"), 
                          plot = F)
p2 <- conditional_effects(n2o_mod4, 
                          resp = "logn2oeq",
                          dpar = "sigma",
                          effects = c("log_elev"), 
                          plot = F)
plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", sigma)), 
               color = "black", face = "bold", size = 14))
```

## Model 5
In the next model, more complexity is added to the N2O component by including a covariate for lake surface area (log scale) as well as interactions between NO3 and log(area) and surface temperature. 
```{r n2o_mod_mv_5, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(log(n2o) ~ mo(no3_cat) +
               log_area +
               surftemp + 
               mo(no3_cat):log_area +
               mo(no3_cat):surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             sigma ~ log_area +
               mo(no3_cat) +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat), 
             family = gaussian())

bf_n2oeq <- bf(log(n2o_eq) ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             sigma ~ surftemp +
               log_elev +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = gaussian())

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "logn2o"),
  prior(normal(0, 1), class = "b", resp = "logn2o"),
  prior(exponential(2), class = "sd", resp = "logn2o"),
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2o"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2o"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2o"),
  
  prior(normal(2, 1), class = "Intercept", resp = "logn2oeq"), 
  prior(normal(0, 1), class = "b", resp = "logn2oeq"), 
  prior(exponential(2), class = "sd", resp = "logn2oeq"), 
  prior(normal(-1, 2), class = "Intercept", dpar = "sigma", resp = "logn2oeq"),
  prior(normal(0, 1), class = "b", dpar = "sigma", resp = "logn2oeq"),
  prior(exponential(2), class = "sd", dpar = "sigma", resp = "logn2oeq"),
  
  prior(lkj(2), class = "rescor"),
  prior(lkj(2), class = "cor")
  )

n2o_mod5 <- brm(bf_n2o + 
                  bf_n2oeq +
                  set_rescor(rescor = TRUE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 12),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 54741,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod5, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
```

### Summarize fit
Below is a summary of the fitted parameters along with MCMC convergence diagnostics.
```{r print_mod5, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

print(n2o_mod5, prior = T, digits = 3)
```

### Model checks
Again, the same PPCs as above were performed for this model.
#### N2O PPC
This PPC for N2O looked similar to the previous model.
```{r ppc_full_checks_mod_n2o5, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
grid.arrange(
pp_check(n2o_mod5, 
         ndraws = 200,
         resp = "logn2o",
         type = "dens_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod5, 
         ndraws = 200,
         resp = "logn2o",
         type = "ecdf_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2o",
         type = "scatter_avg", 
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Equilibrium N2O
Again, the PPCs for this model were similar to the previous model, which was unsurprising given that it was the same model for N2O-eq.
```{r ppc_full_checks_mod_n2oeq5, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
grid.arrange(
pp_check(n2o_mod5, 
         ndraws = 200,
         resp = "logn2oeq",
         type = "dens_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod5, 
         ndraws = 200,
         resp = "logn2oeq",
         type = "ecdf_overlay",
         cores = 1) + 
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod5, 
         ndraws = 1000,
         resp = "logn2oeq",
         type = "scatter_avg", 
         cores = 1) + 
  theme_tidybayes()
, ncol = 2)
```

#### Bivariate
This PPC was also similar to the previous model.
```{r ppc_bv_check_mod_n2o5, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 20) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o_obs), y = log(n2oeq_obs))) +
  geom_density_2d(aes(x = logn2o, 
                      y = logn2oeq, 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation
This check was also similar to the prevoius model, with perhaps slightly less bias in the proportion unsaturated estimates. There is also a potentially concerning extreme prediction in the observed _vs_ predicted PPC.
```{r ppc_sat_check_mod_n2o5, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(exp(logn2o) < exp(logn2oeq), 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod5, 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = exp(logn2o) / exp(logn2oeq)) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

#### R-square
```{r r2_mod_n2o5, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")
round(bayes_R2(n2o_mod5, resp = "logn2o", cores = 1), 3) 
round(bayes_R2(n2o_mod5, resp = "logn2oeq", cores = 1), 3)
```

### Covariate effects
#### N2O
The conditional effects plot for the covariate effects N2O suggested a similar effect of NO3, but interesting interactions between NO3 and lake area and NO3 and surface temperature. For lake area, the effect was estimated to be larger and more negative at the highest levels of NO3; and slightly negative at the lowest level of NO3. For surface temperature, the effect was estimated to be largest and positive at the highest level of NO3; and negative at the lowest level of NO3.
```{r conditional_effects_mod_n2o5, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

p1 <- conditional_effects(n2o_mod5, 
                          resp = "logn2o", 
                          effects = c("no3_cat"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod5, 
                          resp = "logn2o", 
                          effects = c("log_area:no3_cat"), 
                          plot = F)
p3 <- conditional_effects(n2o_mod5, 
                          resp = "logn2o", 
                          effects = c("surftemp:no3_cat"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]], 
                 plot(p2, plot = F)[[1]],
                 plot(p3, plot = F)[[1]])

rm(p1, p2, p3)

annotate_figure(plt, top = text_grob(expression(paste("N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

The estimated covariate effects on $\sigma$ suggested a negative relationship with log(area) and a positive relationship, again, with NO3.
```{r conditional_effects_sigma_n2o5, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

p1 <- conditional_effects(n2o_mod5, 
                          resp = "logn2o",
                          dpar = "sigma",
                          effects = c("log_area"),
                          plot = F)

p2 <- conditional_effects(n2o_mod5, 
                          resp = "logn2o",
                          dpar = "sigma",
                          effects = c("no3_cat"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, 
                top = text_grob(expression(paste("N2O: covariate effects on ", sigma)),
                                color = "black",
                                face = "bold",
                                size = 14))
```

#### Equilibrium N2O
The estimated covariate effect on N2O remained largely the same as estimated in the previous model.
```{r conditional_effects_mod_n2oeq5, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

p1 <- conditional_effects(n2o_mod5, 
                          resp = "logn2oeq", 
                          effects = c("surftemp:log_elev"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod5, 
                          resp = "logn2oeq", 
                          effects = c("log_elev:surftemp"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

```{r conditional_effects_sigma_n2oeq5, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod5.rda")

p1 <- conditional_effects(n2o_mod5, 
                          resp = "logn2oeq",
                          dpar = "sigma",
                          effects = c("surftemp"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod5, 
                          resp = "logn2oeq",
                          dpar = "sigma",
                          effects = c("log_elev"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", sigma)), 
               color = "black", face = "bold", size = 14))
```

## A Final Model
As demonstrated above, models excluding the NO3 covariate consistently resulted in poorer fits to to the observed dissolved N2O data. Including surface temperature and elevation in the equilibrium N2O part of the model resulted in substantially improved replication of key aspects of the observed data. Likewise, added flexibility in the distributional terms for both dissolved and equilibrium N2O led to improvements. 

To make inferences from this model for N2O in the population of interest, however, the included covariates needed to be (1) fully observed across that population or (2) their missingness needed to be modeled. For the lake area and elevation covariates, data _was_ available for all lakes from previously compiled geospatial databases. However, neither surface temperature or NO3 were observed for lakes outside of the sample. They were partially observed with respect to the target population. Their missingness needed to be accounted for in a model. Therefore, a more complex model was constructed below that included surface temperature and NO3 as additional responses conditioned on the survey design variables and fully observed covariates. This approach to inference for N2O was similar to a Bayesian structural equation model [@Merkle_etal_2021; @Merkle_Rosseel_2018]. The main details of the logical dependence structure could be characterized as:

$$\begin{align} 
\color{#1F449C}{\boldsymbol{N_2O_{diss}}} &=\sim Survey + Area + \color{#F05039}{\boldsymbol{NO_3}} + \color{#EEBAB4}{\boldsymbol{Temp}} \\ 
\color{#A8B6CC}{\boldsymbol{N_2O_{equil}}} &=\sim Survey + Elev  + \color{#EEBAB4}{\boldsymbol{Temp}}\\
\color{#F05039}{\boldsymbol{NO_3}} &=\sim Survey + Area + \color{#EEBAB4}{\boldsymbol{Temp}} \\ 
\color{#EEBAB4}{\boldsymbol{Temp}} &=\sim Survey + Lat + Elev + Day
\end{align}$$

Variables in color text above were treated as partially observed with respect to the population of interest (i.e., observed only in the sample), whereas variables in black text were considered fully observed. The partially observed variables, being dissolved and equilibrium N2O, NO3, and surface temperature, were each modeled conditional on the survey design variables and other partially and/or fully observed covariates. This structural equation approach requires a more complex set of post-processing steps compared to a typical MRP analysis. In order to propagate estimates and uncertainty through the dependency structure and make inferences, the fitted model was used to first predict surface temperature in the target population, since it depended only on the fully observed covariates. That predictive distribution was then used alongside the relevant fully observed covariates to predict NO3 in the target population. Finally, the predictive distributions for termperature and NO3 were used to predict the N2O responses. These steps were carried out in the "Predict to population" section to follow.

In the final model below, the submodel for surface temperature assumed a Gamma distributed error distribution and the linear predictor included the survey design variables, latitude, elevation, and julian date. The shape parameter was also modeled as a function of latitude to address increasing response variance along the latitudinal gradient. The NO3 submodel was a cumulative logit formulation and the linear predictor included all of the survey factors as well as surface temperature and lake area. 

The N2O and N2O-eq responses were each modeled with Gamma distributed errors, but with the same covariate structure as in model 5. The same structure was also employed for the shape terms in these responses, corresponding to the $\sigma$ terms in the previous model. Though not shown in this document, the Gamma error structure appeared to result in slightly better performance in the predictive checks compared to the Gaussian errors in previous models. This was primarily apparent in the saturation ratio checks, which may have been more sensitive to model performance in the tails of the N2O responses. Others have also indicated that the Gamma error distribution can work well for dissolved N2O data [@Webb_etal_2019].

Note that there was no residual correlation term for this model, since the residuals are undefined for the Gamma and cumulative logit models. Dropping the observation-level residual correlation term was deemed a reasonable compromise that enabled modeling the missingness of NO3, in particular. Nevertheless, the random intercepts again allowed for potential correlations between responses at the group levels. 

```{r n2o_mod_mv_6, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")

bf_n2o <- bf(n2o ~ mo(no3_cat) +
               log_area +
               surftemp + 
               mo(no3_cat):log_area +
               mo(no3_cat):surftemp +
               (mo(no3_cat) | a | WSA9) + 
               (mo(no3_cat) | b | WSA9:state) + 
               (1 | c | WSA9:state:size_cat),
             shape ~ log_area +
               mo(no3_cat) +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = Gamma(link = "log"))

bf_n2oeq <- bf(n2o_eq ~ surftemp +
                 log_elev +
                 surftemp:log_elev +
                 (1 | a | WSA9) + 
                 (1 | b | WSA9:state) +
                 (1 | c | WSA9:state:size_cat),
             shape ~ surftemp +
               log_elev +
               (1 | WSA9) + 
               (1 | WSA9:state) + 
               (1 | WSA9:state:size_cat),
             family = Gamma(link = "log"))

bf_temp <- bf(surftemp ~ lat +
                s(log_elev) +
                s(jdate) +
                (1 | a | WSA9) + 
                (1 | b | WSA9:state) +
                (1 | c | WSA9:state:size_cat),
              shape ~ lat,
              family = Gamma(link = "log"))

bf_no3 <- bf(no3_cat ~ surftemp +
               log_area +
               (1 | a | WSA9) +
               (1 | b | WSA9:state) +
               (1 | c | WSA9:state:size_cat),
             family = cumulative(link = "logit", threshold="flexible"))

priors <- c(
  prior(normal(2, 1), class = "Intercept", resp = "n2o"),
  prior(normal(0, 1), class = "b", resp = "n2o"),
  prior(exponential(2), class = "sd", resp = "n2o"),
  prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "n2o"),
  prior(normal(0, 1), class = "b", dpar = "shape", resp = "n2o"),
  prior(exponential(2), class = "sd", dpar = "shape", resp = "n2o"),
  
  prior(normal(2, 1), class = "Intercept", resp = "n2oeq"), 
  prior(normal(0, 1), class = "b", resp = "n2oeq"),  
  prior(exponential(2), class = "sd", resp = "n2oeq"),
  prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "n2oeq"),
  prior(normal(0, 1), class = "b", dpar = "shape", resp = "n2oeq"),
  prior(exponential(2), class = "sd", dpar = "shape", resp = "n2oeq"),
  
  prior(normal(3, 1), class = "Intercept", resp = "surftemp"), 
  prior(normal(0, 1), class = "b", resp = "surftemp"), 
  prior(exponential(0.5), class = "sds", resp = "surftemp"),
  prior(exponential(2), class = "sd", resp = "surftemp"),
  prior(normal(5, 4), class = "Intercept", dpar = "shape", resp = "surftemp"),
  prior(normal(0, 1), class = "b", dpar = "shape", resp = "surftemp"),
  
  prior(normal(0, 3), class = "Intercept", resp = "no3cat"),
  prior(normal(0, 1), class = "b", resp = "no3cat"),
  prior(exponential(1), class = "sd", resp = "no3cat"),
  
  prior(lkj(2), class = "cor")
  )

n2o_mod6 <- brm(bf_n2o + 
                  bf_n2oeq + 
                  bf_temp + 
                  bf_no3 + 
                  set_rescor(rescor = FALSE),
                data = df_model, 
                prior = priors,
  control = list(adapt_delta = 0.975, max_treedepth = 14),
  #sample_prior = "only",
  save_pars = save_pars(all = TRUE),
  seed = 85132,#14548,
  #init = my_inits,
  init_r = 0.5,
  chains=4, 
  iter=5000, 
  cores=4)

save(n2o_mod6, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
```

### Summarize fit
Below is a summary of the fitted parameters and MCMC diagnostics. 
```{r print_mod6, echo=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

print(n2o_mod6, digits = 3, prior = T)
```

### Model checks
Below, the same PPCs for N2O and N2O-eq were employed as before.
#### N2O PPC
The PPCs for N2O from this model were similarly reasonable as for models 4 and 5 above.
```{r ppc_full_checks_mod_n2o6, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
grid.arrange(
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "dens_overlay",
         nsamples = 40,
         cores = 1) + 
  theme_tidybayes() +
  scale_x_continuous(trans = "log") +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "ecdf_overlay",
         nsamples = 40, 
         cores = 1) + 
  theme_tidybayes() +
  scale_x_continuous(trans = "log") +
  xlab(expression(paste("log(N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2o",
         type = "scatter_avg", 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
, ncol = 2)
```

#### Equilibrium N2O PPC
Again, the PPCs for N2O-eq in this model were similar to those for models 4 and 5.
```{r ppc_full_checks_mod_n2oeq6, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
grid.arrange(
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "dens_overlay",
         nsamples = 40,
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O) concentration"))) + 
  ylab("density")
,
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "ecdf_overlay",
         nsamples = 40, 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  theme_tidybayes() +
  xlab(expression(paste("log(Equilibrium N"[2],"O) concentration"))) + 
  ylab("cumulative density")
,
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "stat_2d", 
         stat = c("mean", "sd"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "stat_2d", 
         stat = c("kurtosis", "skewness"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "stat_2d", 
         stat = c("min", "max"), 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
,
pp_check(n2o_mod6, 
         resp = "n2oeq",
         type = "scatter_avg", 
         cores = 1) + 
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  theme_tidybayes()
, ncol = 2)
```

#### Bivariate PPC
This model again provided a very reasonable representation of the bivariate relationship between N2O and N2O-eq (below).
```{r ppc_bv_check_mod_n2o6, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
df_model %>%
  add_predicted_draws(n2o_mod6,
                      resp = c("n2o","n2oeq"),
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  ggplot(aes(x = log(n2o_obs), y = log(n2oeq_obs))) +
  geom_density_2d(aes(x = log(n2o), 
                      y = log(n2oeq), 
                      group = .draw),
                  bins = 10,
                  color = "lightblue", 
                  alpha = 0.4) +
  geom_density_2d(color = "black", bins = 10) +
  xlim(1.25, 2.75) +
  ylim(1.25, 2.75) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
```

#### Saturation PPC
The saturation ratio PPCs below show similar behavior as with models 4 and 5 above, but with perhaps slightly less bias in the predictions for the proportion of undersaturated waterbodies and fewer extreme predictions for the means and standard deviations. The observed _vs._ predicted PPC also appears to have a better behaved variance and no extreme predictions, compared to models 4 and 5 with the lognormal errors.
```{r ppc_sat_check_mod_n2o6, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
grid.arrange(
df_model %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"),
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(n2o < n2oeq, 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 984) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  theme_tidybayes()
,
df_model %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

The plot below shows the same PPC, but for the "test" or second-vist data. Overall, the model looked to perform similarly as with the data used to fit it.
```{r ppc_sat_check_testdata_mod_n2o6, echo=FALSE, fig.align='center', fig.height=8, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_test.rda")
grid.arrange(
df_test %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"),
                      ndraws = 50) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  ggplot(aes(x = sat_ratio)) +
  geom_density(aes(x = sat_pred, group = .draw), 
               n = 1024, 
               adjust = 1,
               color = "lightblue") +
  geom_density(n = 1024, adjust = 2) +
  xlim(0, 5) +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(usat_pred = ifelse(n2o < n2oeq, 1, 0)) %>%
  group_by(.draw) %>%
  summarise(prop_pred = sum(usat_pred) / 95) %>%
  ggplot(aes(x = prop_pred)) +
  geom_histogram(binwidth = 0.001, fill = "lightblue") +
  geom_vline(data = df_model, mapping = aes(xintercept = sum(n2o < n2o_eq)/984)) +
  xlab("Proportion undersaturated waterbodies") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.draw) %>%
  mutate(mean_yrep = mean(sat_pred),
         sd_yrep = sd(sat_pred)) %>% 
  ungroup() %>%
  mutate(mean_y = mean(sat_ratio),
         sd_y = sd(sat_ratio)) %>%
  ggplot(aes(x = mean_yrep, y = sd_yrep)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_vline(aes(xintercept = mean_y), linetype = "dashed") +
  geom_hline(aes(yintercept = sd_y), linetype = "dashed") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.draw) %>%
  mutate(min_yrep = min(sat_pred),
         max_yrep = max(sat_pred)) %>% 
  ungroup() %>%
  mutate(min_y = min(sat_ratio),
         max_y = max(sat_ratio)) %>%
  ggplot(aes(x = min_yrep, y = max_yrep)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_vline(aes(xintercept = min_y), linetype = "dashed") +
  geom_hline(aes(yintercept = max_y), linetype = "dashed") +
  theme_tidybayes()
,
df_test %>%
  add_predicted_draws(n2o_mod6, 
                      resp = c("n2o", "n2oeq"), 
                      ndraws = 500) %>%
  rename(n2o_obs = n2o,
         n2oeq_obs = n2o_eq) %>%
  tidyr::pivot_wider(names_from = .category,
              values_from = .prediction) %>%
  mutate(sat_ratio = n2o_obs / n2oeq_obs,
         sat_pred = n2o / n2oeq) %>%
  group_by(.row) %>%
  mutate(mean_yrep = mean(sat_pred)) %>% 
  filter(.draw == 1) %>% 
  ggplot(aes(x = mean_yrep, y = sat_ratio)) +
  geom_point(color = "lightblue") +
  scale_x_continuous(trans = "log") +
  scale_y_continuous(trans = "log") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_tidybayes()
, ncol = 2)
```

#### R-square
Below are estimates for the Bayesian $R^2$, which were largely similar for N2O and N2O-eq as with models 4 and 5 above. The $R^2$ for the surface temperature response also suggested a fairly good fit.
```{r r2_mod_n2o6, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
round(bayes_R2(n2o_mod6, resp = "n2o"), 3) 
round(bayes_R2(n2o_mod6, resp = "n2oeq"), 3)
round(bayes_R2(n2o_mod6, resp = "surftemp"), 3)
```

Below are the same $R^2$ estimates, but for the second-visit data. That these estimates are similar to those for the data used to fit the model, suggesting that the model may perform similarly well out-of-sample.
```{r r2_test_mod_n2o6, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=4, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_test.rda")
round(bayes_R2(n2o_mod6, resp = "n2o", newdata = df_test), 3) 
round(bayes_R2(n2o_mod6, resp = "n2oeq", newdata = df_test), 3)
round(bayes_R2(n2o_mod6, resp = "surftemp", newdata = df_test), 3)
```

### Covariate effects
#### N2O
The conditional effects plot for the covariate effects N2O suggested a similar effect of NO3, but interesting interactions between NO3 and lake area and NO3 and surface temperature. For lake area, the effect was estimated to be larger and more negative at the highest levels of NO3; and slightly negative at the lowest level of NO3. For surface temperature, the effect was estimated to be largest and positive at the highest level of NO3; and negative at the lowest level of NO3.
```{r conditional_effects_mod_n2oF, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=4}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

p1 <- conditional_effects(n2o_mod6, 
                          resp = "n2o", 
                          effects = c("no3_cat"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod6, 
                          resp = "n2o", 
                          effects = c("log_area:no3_cat"), 
                          plot = F)
p3 <- conditional_effects(n2o_mod6, 
                          resp = "n2o", 
                          effects = c("surftemp:no3_cat"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]], 
                 plot(p2, plot = F)[[1]],
                 plot(p3, plot = F)[[1]])

rm(p1, p2, p3)

annotate_figure(plt, top = text_grob(expression(paste("N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

The estimated covariate effects on $\sigma$ suggested a negative relationship with log(area) and a positive relationship, again, with NO3.
```{r conditional_effects_sigma_n2oF, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

p1 <- conditional_effects(n2o_mod6, 
                          resp = "n2o",
                          dpar = "shape",
                          effects = c("log_area"),
                          plot = F)

p2 <- conditional_effects(n2o_mod6, 
                          resp = "n2o",
                          dpar = "shape",
                          effects = c("no3_cat"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, 
                top = text_grob(expression(paste("N2O: covariate effects on ", shape)),
                                color = "black",
                                face = "bold",
                                size = 14))
```

#### Equilibrium N2O
The estimated covariate effect on N2O remained largely the same as estimated in the previous model.
```{r conditional_effects_mod_n2oeqF, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

p1 <- conditional_effects(n2o_mod6, 
                          resp = "n2oeq", 
                          effects = c("surftemp:log_elev"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod6, 
                          resp = "n2oeq", 
                          effects = c("log_elev:surftemp"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", mu)), 
               color = "black", face = "bold", size = 14))
```

```{r conditional_effects_sigma_n2oeqF, echo=FALSE, message=FALSE, warning=FALSE, fig.align='center', fig.width=8, fig.height=2}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

p1 <- conditional_effects(n2o_mod6, 
                          resp = "n2oeq",
                          dpar = "shape",
                          effects = c("surftemp"), 
                          plot = F)

p2 <- conditional_effects(n2o_mod6, 
                          resp = "n2oeq",
                          dpar = "shape",
                          effects = c("log_elev"), 
                          plot = F)

plt <- ggarrange(plot(p1, plot = F)[[1]],
                 plot(p2, plot = F)[[1]])

rm(p1, p2)

annotate_figure(plt, top = text_grob(expression(paste("Equilibrium N2O: covariate effects on ", shape)), 
               color = "black", face = "bold", size = 14))
```

# Predict to population
As previously described, in order to make inferences to the population of interest, the final model above was used to, first, predict surface temperature in the target population, since it depended only on the fully observed covariates. Next, the predictive distribution for surface temperature was used, along with the relevant fully observed covariates, to predict NO3 in the target population. Finally, the predictive distributions for temperature and NO3 were used to predict the N2O responses. The code for these steps is outlined in the following.

The first step used the final model to predict to the population:
```{r predict_obsframe, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sframe.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")

predict_temp <- sframe %>%
  mutate(jdate = 205) %>%
  add_predicted_draws(n2o_mod6, resp=c("surftemp"), 
                      allow_new_levels = TRUE, 
                      cores =1, 
                      ndraws = 500) %>%
  mutate(surftemp = .prediction)

save(predict_temp, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")
```

NO3 was next predicted. Note that the posterior predictive distribution for NO3 was subsampled in order to minimize excess simulations
```{r parallel_predict_draws, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")

temp_X <- predict_temp %>% # select relevant columns as predictors
  ungroup() %>%
  select(WSA9,
         state,
         size_cat,
         log_area,
         .row,
         .draw,
         surftemp) %>%
  select(WSA9, state, size_cat, log_area, surftemp)


rm(predict_temp) # reduce memory
gc()

# set number of cores to use for parallel predictions
# and register the workers
cl <- parallel::makeCluster(5) 
doSNOW::registerDoSNOW(cl) 

# make a progress bar
pb <- txtProgressBar(max = 1500, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)

system.time( # approx 26 hrs with 5 workers & 500 draws from PPD
predict_no3 <- foreach(sub_X = isplitRows(temp_X, chunkSize = 155299), 
                       .combine = 'c',
                       .packages = c("brms"),
                       .options.snow = opts
                       ) %dopar% {
                         apply(brms::posterior_predict(n2o_mod6,
                                                 newdata = sub_X,
                                                 resp = "no3cat",
                                                 allow_new_levels = T,
                                                 ndraws = 500,
                                                 cores = 1), 2, sample, 1)
                         }
)


close(pb)
parallel::stopCluster(cl)

save(predict_no3, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
```

Finally, N2O and N2O-eq were predicted using the surface temperature and nitrate predictions along with the survey variables and known covariates. Again, the posterior was subsampled in order to reduce excess simulations.
```{r n2o_covariates_X, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")

# Assemble dataframe containing relevant covariates (known and predicted)
n2o_X <- predict_temp %>%
  ungroup() %>%
  mutate(no3_cat = predict_no3) %>%
  select(WSA9,
         state,
         size_cat,
         log_area,
         surftemp,
         log_elev,
         no3_cat)

# clear objects to reduce memory overhead
rm(predict_no3, predict_temp) 
gc()

# save the predictors for n2o and n2oeq
save(n2o_X, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_X.rda")
```

```{r parallel_predict, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_mod6.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_X.rda")

# set number of cores to use for parallel predictions
# and register the workers
cl <- parallel::makeCluster(6) 
doSNOW::registerDoSNOW(cl) 

# make a progress bar
pb <- txtProgressBar(max = 1500, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)

# make predictions in parallel
system.time(
predict_n2o <- foreach(sub_X = isplitRows(n2o_X, chunkSize = 155299),
                 .combine = rbind,
                 .options.snow = opts,
                 .packages = c("brms")) %dopar% {
  apply(posterior_predict(n2o_mod6,
                          newdata = sub_X,
                          resp = c("n2o", "n2oeq"),
                          allow_new_levels = T,
                          ndraws = 500,
                          cores = 1),
        2, sample, 1)
                   }
)

close(pb)
parallel::stopCluster(cl)

colnames(predict_n2o) <- c("n2o", "n2oeq")

save(predict_n2o, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_n2o.rda")
```

Finally, the predictions for all four partially observed responses were assembled into a new dataframe for use in inference.
```{r assemble_predictions, eval=FALSE, include=TRUE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_n2o.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_no3.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/predict_temp.rda")

all_predictions <- predict_temp %>%
  ungroup() %>%
  mutate(no3cat = predict_no3) %>%
  bind_cols(predict_n2o) %>%
  mutate(n2osat = n2o / n2oeq, # calculate saturation ratio
         .row = rep(1:465897, each = 500),
         .draw = rep(seq(1,500, 1), 465897)) %>%
  mutate(area_ha = exp(log_area)) %>% # include area on ha scale
  select(WSA9,
         state,
         size_cat,
         area_ha,
         lat,
         lon,
         .row,
         .draw,
         surftemp,
         no3cat,
         n2o,
         n2oeq,
         n2osat)

rm(predict_n2o, predict_temp, predict_no3) # clean up workspace for RAM
gc()
 

save(all_predictions, file = "C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
```

# Population estimates
A number of estimates for the target population were assembled and presented below. First, the full posterior predictive distributions for dissolved N2O, equilibrium N2o, and the saturation ratio were assessed. These distributions summarized the predicted distribution of concentrations or ratios for all lakes in the population of interest and included parameter uncertainty propagated through the model. Next, population means were assessed, followed by comparisons of some model-based estimates to previously calculated design-based estimates.

## Posterior predictive distributions
Below, a density plot summarized the posterior predictive distribution of N2O and N2O-eq concentrations across the target population of lakes, based on 500 draws from the posterior predictive distribution. Note that the x-axis was truncated at 50 nmol/L for a clearer visualization of the bulk of the predicted distribution. For reference, the max predicted value was 4403.2 nmol/L for dissolved N2O, 20.4 nmol/L for dissolved N2O, and 793.5 for the saturation ratio. 
```{r plot_n2o_posterior_preds, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  slice_sample(n=1e4) %>% # simple random sample 10k lakes
  ggplot(aes(x = n2o)) +
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  scale_y_log10() +
  xlim(0, 50) +
  xlab("Dissolved N2O concentration") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_n2oeq_posterior_preds, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  slice_sample(n=1e4) %>% # simple random sample 10k lakes
  ggplot(aes(x = n2o)) +
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  scale_y_log10() +
  xlim(0, 50) +
  xlab("Equilibrium N2O concentration") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_sat_posterior_preds, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  slice_sample(n=1e4) %>% # simple random sample 10k lakes
  ggplot(aes(x = n2osat)) +
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  scale_y_log10() +
  xlim(0, 8) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  xlab("N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

## Estimated means
### National
Below are density plots summarizing the posterior distribution of _means_ for N2O concentrations and the saturation ratio for the target population (i.e., all US lakes > 1ha in the lower 48 states).
```{r plot_n2o_nat_posterior_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  summarise(mean_n2o = mean(n2o)) %>%
  ggplot(aes(x = mean_n2o)) + 
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  xlab("mean dissolved N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_n2oeq_nat_posterior_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  summarise(mean_n2o = mean(n2oeq)) %>%
  ggplot(aes(x = mean_n2o)) + 
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  xlab("mean equlilibrium N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_sat_nat_posterior_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  summarise(mean_sat = mean(n2o/n2oeq)) %>%
  ggplot(aes(x = mean_sat)) + 
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  xlab("mean N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

To illustrate the skewness in the predictive distribution for the saturation ratio, an estimate for the median ratio is shown below. The entire posterior distribution of the mean above is larger than 1, the ratio representing the boundary of under- _vs._ oversaturation. By comparison, the posterior estimate of the median below only included values less than one, suggesting that though the mean saturation ratio was greater than 1, most lakes in the national populaiton were undersaturated (i.e., ratio less than 1). In distributions with right-skew, the mean can often be considerably larger than the median.
```{r plot_sat_nat_posterior_median, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>% 
  summarise(median_sat = median(n2o/n2oeq)) %>%
  ggplot(aes(x = median_sat)) + 
  stat_dist_slabinterval(.width = c(0.5, 0.95)) +
  xlab("median N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

Below is a plot of the posterior mean estimate for the proportion of unsaturated lakes at the national scale.
```{r plot_undersat_posterior_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(prop_sat = sum(n2osat < 1) / length(unique(.row))) %>%
  ggplot(aes(x = prop_sat)) + 
  stat_slabinterval(.width = c(0.5, 0.95)) +
  xlab("Proportion of undersaturated waterbodies") +
  ylab("density") +
  theme_tidybayes()
```

### Ecoregion
Below are posterior estimates of the means for dissolved and equilibrium N2O and the saturation ratio by WSA9 ecoregion.
```{r plot_n2o_wsa9_posterior_mean, echo=FALSE, fig.align='center', fig.height=8, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_n2o = mean(n2o), .groups = "drop") %>%
  mutate(ecoregion = WSA9) %>%
  ggplot() + 
  stat_slabinterval(aes(x = mean_n2o, 
                        y = reorder(ecoregion, mean_n2o)), 
                    quantiles = 100,
                    .width = c(0.5, 0.95)) +
  xlab("mean dissolved N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_n2oeq_wsa9_posterior_mean, echo=FALSE, fig.align='center', fig.height=8, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_n2oeq = mean(n2oeq), .groups = "drop") %>%
  mutate(ecoregion = WSA9) %>%
  ggplot() + 
  stat_slabinterval(aes(x = mean_n2oeq, 
                        y = reorder(ecoregion, mean_n2oeq)), 
                    .width = c(0.5, 0.95)) +
  xlab("mean equilibrium N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_sat_wsa9_posterior_mean, echo=FALSE, fig.align='center', fig.height=8, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_sat = mean(n2o/n2oeq), .groups = "drop") %>%
  mutate(ecoregion = WSA9) %>%
  ggplot() + 
  stat_slabinterval(aes(x = mean_sat,
                        y = reorder(ecoregion, mean_sat)),
                    .width = c(0.5, 0.95)) +
  xlab("mean N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

A plot of the posterior estimates for the median saturation ratio below indicated, again, that most lakes in each ecoregion were undersaturated (i.e., median << 1).
```{r plot_sat_wsa9_posterior_median, echo=FALSE, fig.align='center', fig.height=8, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(median_sat = median(n2o/n2oeq), .groups = "drop") %>%
  mutate(ecoregion = WSA9) %>%
  ggplot() + 
  stat_slabinterval(aes(x = median_sat,
                        y = reorder(ecoregion, median_sat)),
                    .width = c(0.5, 0.95)) +
  xlab("median N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

A plot of the estimates of the proportion of under-saturated lakes by ecoregion is below.
A plot of the posterior estimates for the median saturation ratio below indicated, again, that most lakes in each ecoregion were undersaturated (i.e., median << 1).
```{r plot_prop_sat_wsa9_posterior_median, echo=FALSE, fig.align='center', fig.height=8, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(prop_sat = sum(n2osat < 1) / length(unique(.row)), .groups = "drop") %>%
  mutate(ecoregion = WSA9) %>%
  ggplot() + 
  stat_slabinterval(aes(x = prop_sat,
                        y = reorder(ecoregion, prop_sat)),
                    .width = c(0.5, 0.95)) +
  xlab("Proportion of undersaturated lakes") +
  ylab("density") +
  theme_tidybayes()
```

### State
Comparisons of mean estimates (posterior median, upper and lower 95th percentiles) by state are below. Density estimates were not included to minimize plot space.
```{r plot_state_mean_n2o, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(state, .draw) %>%
  summarise(mean_n2o = mean(n2o), .groups = "drop") %>%
  group_by(state) %>%
  summarise(estimate = round(median(mean_n2o), 1),
    LCL = round(quantile(mean_n2o, probs = 0.025), 1),
    UCL = round(quantile(mean_n2o, probs = 0.975), 1),
    .groups = "drop") %>% 
  select(state, estimate, LCL, UCL) %>%
  mutate(state = forcats::fct_reorder(state, estimate)) %>%
  ggplot(aes(x = state, y = estimate )) +
  geom_point(position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL),
                 position=position_dodge(width=0.5)) +
  ylab("mean dissolved N2O") +
  scale_y_continuous(position = "left") + 
  theme_tidybayes() +
  theme(axis.text.x = element_text(size=9, angle=45))
```

```{r plot_state_mean_n2oeq, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(state, .draw) %>%
  summarise(mean_n2oeq = mean(n2oeq), .groups = "drop") %>%
  group_by(state) %>%
  summarise(estimate = round(median(mean_n2oeq), 1),
    LCL = round(quantile(mean_n2oeq, probs = 0.025), 1),
    UCL = round(quantile(mean_n2oeq, probs = 0.975), 1),
    .groups = "drop") %>% 
  select(state, estimate, LCL, UCL) %>%
  mutate(state = forcats::fct_reorder(state, estimate)) %>%
  ggplot(aes(x = state, y = estimate )) +
  geom_point(position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL),
                 position=position_dodge(width=0.5)) +
  ylab("mean equilibrium N2O") +
  scale_y_continuous(position = "left") +
  theme_tidybayes() +
  theme(axis.text.x = element_text(size=9, angle=45))
```

Below, a plot of estimates for the mean (black circles) and median (grey circles) saturation ratio by state. A horizontal, dashed, black line is shown at ratio = 1, indicating the boundary for under- _vs._ oversaturation. Only a few states (e.g., NV, DE) had median estimates that were 1 or greater, suggesting that, for most states, most lakes were undersaturated.
```{r plot_state_mean_median_sat, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(state, .draw) %>%
  summarise(mean_sat = mean(n2osat), 
            median_sat = median(n2osat),
            .groups = "drop") %>%
  group_by(state) %>%
  summarise(estimate_mean = round(median(mean_sat), 4),
    LCL_mean = round(quantile(mean_sat, probs = 0.025), 4),
    UCL_mean = round(quantile(mean_sat, probs = 0.975), 4),
    estimate_median = round(median(median_sat), 4),
    LCL_median = round(quantile(median_sat, probs = 0.025), 4),
    UCL_median = round(quantile(median_sat, probs = 0.975), 4),
    .groups = "drop") %>% 
  select(state, 
         estimate_mean, 
         estimate_median, 
         LCL_mean,
         LCL_median,
         UCL_mean,
         UCL_median) %>%
  mutate(state = forcats::fct_reorder(state, estimate_mean)) %>%
  ggplot(aes(x = state, y = estimate_mean )) +
  geom_point(position=position_dodge(width=0.5),
             size = 2) +
  geom_linerange(aes(ymin = LCL_mean, ymax = UCL_mean),
                 position=position_dodge(width=0.5)) +
  geom_point(aes(x = state, y = estimate_median), 
             position=position_dodge(width=0.5),
             color = "grey",
             size = 2) +
  geom_linerange(aes(ymin = LCL_median, ymax = UCL_median),
                 position=position_dodge(width=0.5),
                 color = "gray") +
  ylab("mean and median N2O saturation ratio") +
  scale_y_continuous(position = "left") +
  geom_hline(yintercept = 1, color = "black", linetype = "dashed") +
  theme_tidybayes() +
  theme(axis.text.x = element_text(size=9, angle=45))
```

Finally, a plot of the estimated proportion of undersaturated lakes for each state in the target population. Point estimates are the posterior median of the proportion and bars are the upper and lower boundaries of the central 95th percentile of the posterior distributions of proportions.
```{r plot_state_prop_sat, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(state, .draw) %>%
  summarise(prop_sat = sum(n2osat < 1) / length(unique(.row)),
            .groups = "drop") %>%
  group_by(state) %>%
  summarise(estimate = round(median(prop_sat), 4),
    LCL = round(quantile(prop_sat, probs = 0.025), 4),
    UCL = round(quantile(prop_sat, probs = 0.975), 4),
    .groups = "drop") %>% 
  select(state, 
         estimate, 
         LCL,
         UCL) %>%
  mutate(state = forcats::fct_reorder(state, estimate)) %>%
  ggplot(aes(x = state, y = estimate)) +
  geom_point(position=position_dodge(width=0.5),
             size = 2) +
  geom_linerange(aes(ymin = LCL, ymax = UCL),
                 position=position_dodge(width=0.5)) +
  ylab("Proportion of undersaturated lakes") +
  scale_y_continuous(position = "left") +
  theme_tidybayes() +
  theme(axis.text.x = element_text(size=9, angle=45))
```

### Size category
The estimated means by size category are below for dissolved and equilibrium N2O and the saturation ratio. Median estimates for the saturation ratio are also shown.
```{r plot_n2o_size_posterior_mean, echo=FALSE, fig.align='center', fig.height=6, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = mean(n2o),
            .groups = "drop") %>%
  ggplot(aes(x = mean_n2o, y = size_cat)) + 
  stat_slabinterval(.width = c(0.5, 0.95)) +
  xlab("mean dissolved N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_n2oeq_size_posterior_mean, echo=FALSE, fig.align='center', fig.height=6, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = mean(n2oeq),
            .groups = "drop") %>%
  ggplot(aes(x = mean_n2o, y = size_cat)) + 
  stat_slabinterval(.width = c(0.5, 0.95)) +
  xlab("mean equilibrium N2O") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_sat_size_posterior_mean, echo=FALSE, fig.align='center', fig.height=6, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = mean(n2osat),
            .groups = "drop") %>%
  ggplot(aes(x = mean_n2o, y = size_cat)) + 
  stat_slabinterval(.width = c(0.5, 0.95)) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "black") +
  xlab("mean N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

```{r plot_sat_size_posterior_median, echo=FALSE, fig.align='center', fig.height=6, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = median(n2osat),
            .groups = "drop") %>%
  ggplot(aes(x = mean_n2o, y = size_cat)) + 
  stat_slabinterval(.width = c(0.5, 0.95)) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "black") +
  xlab("median N2O saturation ratio") +
  ylab("density") +
  theme_tidybayes()
```

Mean _vs._ median below.
```{r plot_size_cat_mean_median_sat, echo=FALSE, fig.align='center', fig.height=5, fig.width=8, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_sat = mean(n2osat), 
            median_sat = median(n2osat),
            .groups = "drop") %>%
  group_by(size_cat) %>%
  summarise(estimate_mean = round(median(mean_sat), 2),
    LCL_mean = round(quantile(mean_sat, probs = 0.025), 2),
    UCL_mean = round(quantile(mean_sat, probs = 0.975), 2),
    estimate_median = round(median(median_sat), 2),
    LCL_median = round(quantile(median_sat, probs = 0.025), 2),
    UCL_median = round(quantile(median_sat, probs = 0.975), 2),
    .groups = "drop") %>% 
  select(size_cat, 
         estimate_mean, 
         estimate_median, 
         LCL_mean,
         LCL_median,
         UCL_mean,
         UCL_median) %>%
  mutate(state = forcats::fct_reorder(size_cat, estimate_mean)) %>%
  ggplot(aes(x = size_cat, y = estimate_mean )) +
  geom_point(position=position_dodge(width=0.5),
             size = 2) +
  geom_linerange(aes(ymin = LCL_mean, ymax = UCL_mean),
                 position=position_dodge(width=0.5)) +
  geom_point(aes(x = size_cat, y = estimate_median), 
             position=position_dodge(width=0.5),
             color = "grey",
             size = 2) +
  geom_linerange(aes(ymin = LCL_median, ymax = UCL_median),
                 position=position_dodge(width=0.5),
                 color = "gray") +
  ylab("mean and median N2O saturation ratio") +
  scale_y_continuous(position = "left") +
  geom_hline(yintercept = 1, color = "black", linetype = "dashed") +
  theme_tidybayes() +
  theme(axis.text.x = element_text(size=9, angle=45))
```

And, finally, the estimated proportion of undersaturated lakes in the target population by size category
```{r plot_prop_sat_size_cat_posterior_median, echo=FALSE, fig.align='center', fig.height=6, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(prop_sat = sum(n2osat < 1) / length(unique(.row)), .groups = "drop") %>%
  ggplot() + 
  stat_slabinterval(aes(x = prop_sat,
                        y = reorder(size_cat, prop_sat)),
                    .width = c(0.5, 0.95)) +
  xlab("Proportion of undersaturated lakes") +
  ylab("density") +
  theme_tidybayes()
```

## Model- _vs._ design-based
Below, estimates from the model-based approach are compared to design-based estimates. In general, the model estimates were similar to the design-based estimates. Model estimates were typically within the confidence bounds of the design-based estimates, but with much greater precision. Improved precision was expected due to the "shrinkage" induced by the multilevel parameterization, which allowed some "borrowing" of information across the various levels of the survey factors. 

### Dissolved N2O
Below, National mean estimates for dissolved N2O from the model and design-based approaches were compared. The sample-based estimate was also included as a naive reference.
```{r n2o_means_national, message=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(mean_n2o = mean(n2o)) %>%
  summarise(estimate = round(median(mean_n2o), 2), # posterior median
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2)) %>% 
  mutate(type = "model") %>%
  bind_rows(cbind(n2o_survey_ests[10, 2:4], type = rep("survey", 1))) %>%
  add_row(estimate = round(mean(df_model$n2o), 2),
          type = "sample") %>%
  print()
```

The black, vertical, dashed line in the figure below represents the mean of the sample. 
```{r plot_n2o_means_national, echo=FALSE, fig.align='center', fig.height=2, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(mean_n2o = mean(n2o)) %>%
  summarise(estimate = round(median(mean_n2o), 2), # posterior median
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2)) %>% 
  mutate(type = "model") %>%
  bind_rows(cbind(n2o_survey_ests[10, 2:4], type = rep("survey", 1))) %>%
  mutate(cl_width = round(UCL - LCL, 2)) %>%
  ggplot(aes(x = type, y = estimate, color = type)) +
  geom_point(size = 2, position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  geom_hline(yintercept = round(mean(df_model$n2o), 2), 
             linetype = "dashed",
             color = "black") +
  ylab("mean N2O concentration") +
  ggtitle("National estimate comparison") +
  coord_flip() + 
  theme_tidybayes()
```

Below, estimates were compared by ecoregion.
```{r n2o_mean_wsa9, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_n2o = mean(n2o)) %>%
  group_by(WSA9, .groups = "drop") %>%
  summarise(estimate = round(median(mean_n2o), 2),
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2),
    .groups = "drop") %>% 
  mutate(ecoregion = factor(WSA9)) %>%
  mutate(type = "model") %>%
  select(ecoregion, estimate, LCL, UCL, type) %>%
  mutate(ecoregion = forcats::fct_reorder(ecoregion, estimate)) %>%
  bind_rows(cbind(n2o_survey_ests[-10,], type = rep("survey", 9))) %>%
  arrange(ecoregion) %>%
  print()
```

```{r plot_mean_n2o_wsa9, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise(mean_n2o = mean(n2o), .groups = "drop") %>%
  group_by(WSA9) %>%
  summarise(estimate = round(median(mean_n2o), 2),
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2),
    .groups = "drop") %>% 
  mutate(ecoregion = factor(WSA9)) %>%
  mutate(type = "model") %>%
  select(ecoregion, estimate, LCL, UCL, type) %>%
  mutate(ecoregion = forcats::fct_reorder(ecoregion, estimate)) %>%
  bind_rows(cbind(n2o_survey_ests[-10,], type = rep("survey", 9))) %>%
  ggplot(aes(x = ecoregion, y = estimate, group = type, color = type)) +
  geom_point(size = 2, position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  coord_flip() + 
  theme_tidybayes() +
  ylab("mean dissolved N2O") +
  ggtitle("Ecoregion estimates comparison")
```

Means were compared according to size categories below.
```{r table_size_mean_n2o, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests_size.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = mean(n2o), .groups = "drop") %>%
  group_by(size_cat) %>%
  summarise(estimate = round(median(mean_n2o), 1),
    LCL = round(quantile(mean_n2o, probs = 0.025), 1),
    UCL = round(quantile(mean_n2o, probs = 0.975), 1),
    .groups = "drop") %>%
  mutate(type = "model") %>%
  bind_rows(cbind(n2o_survey_ests_size, type = rep("survey", 5))) %>%
  mutate(size = factor(size_cat)) %>%
  mutate(size = forcats::fct_reorder(size, estimate)) %>%
  mutate(cl_width = UCL - LCL) %>%
  arrange(size) %>%
  select(size, estimate, LCL, UCL, type) %>% 
  print()
```

```{r plot_size_mean_n2o, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/n2o_survey_ests_size.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_n2o = mean(n2o), .groups = "drop") %>%
  group_by(size_cat) %>%
  summarise(estimate = round(median(mean_n2o), 2),
    LCL = round(quantile(mean_n2o, probs = 0.025), 2),
    UCL = round(quantile(mean_n2o, probs = 0.975), 2),
    .groups = "drop") %>% 
  #mutate(size_cat = factor(size_cat)) %>%
  mutate(type = "model") %>%
  select(size_cat, estimate, LCL, UCL, type) %>%
  bind_rows(cbind(n2o_survey_ests_size, type = rep("survey", 5))) %>%
  mutate(size_cat = forcats::fct_reorder(size_cat, estimate)) %>%
  ggplot(aes(x = size_cat, y = estimate, group = type, color = type)) +
  geom_point(size = 2, position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  coord_flip() + 
  theme_tidybayes() +
  ylab("mean dissolved N2O") +
  ggtitle("Size category estimates comparison")
```

### Saturation
Below, the same comparisons were made for the saturation estimates.
```{r table_nat_sat_mean, message=FALSE, warning=FALSE, fig.align='center', fig.height=6, fig.width=8}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sat_survey_ests.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(mean_sat = mean(n2osat), .groups = "drop") %>%
  summarise(estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3),
    .groups = "drop") %>% 
  mutate(type = "model") %>%
  bind_rows(cbind(sat_survey_ests[10, 2:4], type = rep("survey", 1))) %>%
  add_row(estimate = round(mean(df_model$n2o / df_model$n2o_eq), 3),
          type = "sample") %>%
  print()
```

```{r plot_nat_sat_mean, echo=FALSE, fig.align='center', fig.height=2, fig.width=4, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/df_model.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sat_survey_ests.rda")

all_predictions %>%
  group_by(.draw) %>%
  summarise(mean_sat = mean(n2osat)) %>%
  summarise(estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3)) %>%  
  mutate(type = "model") %>%
  bind_rows(cbind(sat_survey_ests[10, 2:4], type = rep("survey", 1))) %>%
  ggplot(aes(x = type, y = estimate, color = type)) +
  geom_point(size = 2, position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  geom_hline(yintercept = round(mean(df_model$n2o / df_model$n2o_eq), 3), 
             linetype = "dashed",
             color = "black") +
  ylab("mean N2O saturation ratio") +
  ggtitle("National estimates comparison") +
  coord_flip() + 
  theme_tidybayes()
```
```{r plot_wsa9_sat_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sat_survey_ests.rda")

all_predictions %>%
  group_by(WSA9, .draw) %>%
  summarise( mean_sat = mean(n2osat), .groups = "drop") %>%
  group_by(WSA9) %>%
  summarise( estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3),
    .groups = "drop") %>% 
  mutate(ecoregion = factor(WSA9)) %>%
  mutate(type = "model") %>%
  mutate(ecoregion = forcats::fct_reorder(ecoregion, estimate)) %>%
  select(ecoregion, estimate, LCL, UCL, type) %>%
  bind_rows(cbind(sat_survey_ests[-10,], type = rep("survey", 9))) %>%
  mutate(cl_width = UCL - LCL) %>%
  ggplot(aes(x = ecoregion, y = estimate, group = type, color = type)) +
  geom_point(size = 2, position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  geom_hline(yintercept = 1, color = "lightgrey") +
  ylab("mean N2O saturation ratio") +
  ggtitle("Ecoregion estimates comparison") +
  coord_flip() + 
  theme_tidybayes()
```

```{r table_size_sat_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/sat_survey_ests_size.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_sat = mean(n2osat), .groups = "drop") %>%
  group_by(size_cat) %>%
  summarise(estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3)) %>%
  mutate(type = "model") %>%
  bind_rows(cbind(sat_survey_ests_size, type = rep("survey", 5))) %>%
  mutate(size = factor(size_cat)) %>%
  mutate(cl_width = UCL - LCL) %>%
  arrange(size) %>%
  select(size, estimate, LCL, UCL, type) %>% 
  print()
```

```{r plot_size_sat_mean, echo=FALSE, fig.align='center', fig.height=4, fig.width=6, message=FALSE, warning=FALSE}
load("C:/Users/rmartin/OneDrive - Environmental Protection Agency (EPA)/Documents/AE_Reservoirs/DissolvedGasNla/modelFiles/all_predictions.rda")

all_predictions %>%
  group_by(size_cat, .draw) %>%
  summarise(mean_sat = mean(n2osat), .groups = "drop") %>%
  group_by(size_cat) %>%
  summarise(estimate = round(median(mean_sat), 3),
    LCL = round(quantile(mean_sat, probs = 0.025), 3),
    UCL = round(quantile(mean_sat, probs = 0.975), 3),
    .groups = "drop") %>%
  mutate(type = "model") %>%
  bind_rows(cbind(sat_survey_ests_size, type = rep("survey", 5))) %>%
  mutate(size = factor(size_cat)) %>%
  select(size, estimate, LCL, UCL, type) %>% 
  ggplot(aes(x = size, y = estimate, group = type, color = type)) +
  geom_point(position=position_dodge(width=0.5)) +
  geom_linerange(aes(ymin = LCL, ymax = UCL) , position=position_dodge(width=0.5)) +
  scale_colour_manual(values = c("black", "grey")) +
  ylab("mean N2O saturation ratio") +
  ggtitle("Size category estimates comparison") +
  coord_flip() + 
  theme_tidybayes()
```

# References

# Session Info
```{r session}
sessionInfo()
```